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PREFACE 


In a field already covered by many textbooks, the presentation of a new 
book requires special justification 

The study of statistics has two aspects First, there is a set of operational 
procedures to be mastered, leading to standard conclusions drawn according 
to standard rules The student can learn to apply these procedures without 
understanding the principles underlying their validity This, of course, 
is the part of the subject which the student must master if he is to make 
any practical use of statistics Secondly, there is a body of mathematical 
and logical analysis which demonstrates the validity of the procedures 
It is not necessary to master this mathematical analysis in order to use 
statistics, nevertheless there are cogent reasons for studying it In the first 
place, an understanding of the theoretical basis for a procedure often brmgs 
with it an understanding of the limitations of the procedure, and the 
statistician with theoretical traimng is not likely to apply his procedures 
to situations where they are not suitable In the second place, knowledge 
of the theory makes statistics a far more versatile tool m the hands of the 
worker With the theory, he is capable of modifying his procedure to fit 
unusual situations, whereas a worker who knows only a set of specific 
operational procedures is helpless if the problem contains an unusual or 
unexpected ingredient And m the third place, there is far more intellectual 
satisfaction for the student m carrying out procedures which he under- 
stands than there is m carrying out procedures which he accepts on someone 
else’s authority 

All textbooks must make a compromise of some sort between these two 
aspects of statistics In general, books for beginners are wholly or mainly 
operational, whereas intermediate and advanced books mix theory with 
application m varying proportions To understand most of the principles 
used m elementary statistics it is generally necessary for the student to 
master books used m advanced courses, which usually require calculus and 
sometimes differential equations Most students, however, take only one 
course m statistics and have neither the time nor the mathematical back- 
ground to continue to the advanced courses. Their mastery of statistics 
consequently is almost entirely operational 

It is the thesis of this book that it is possible and desirable to give begin- 
ning non-technical students a deeper understanding of theory than is 
customarily attempted Many of the principles which ordinarily must be 
“taken on faith” by the elementary student are here derived m terms of 
simple mathematics not involving calculus These derivations have been 
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used with success m classes composed of students m psychology, sociology, 
biology, education, and other non-mathematical fields This book is 
written for such students and also for the many workers m industry, medi- 
cine, conservation, and other fields who use statistics m their work and are 
curious about the theoretical basis for their tools 
The possibilities of making theoretical mastery available to beginning 
students are necessarily limited Much of the elementary theory of sta- 
tistics is intrinsically technical and hopelessly beyond the reach of the 
beginner, and much of the remainder can be brought to the beginner only 
with some sacrifice of mathematical exactness In a few cases the deriva- 
tions are accomplished by means of simplifications which perhaps conceal 
some of the logical difficulties, and m a very few cases derivations are 
omitted altogether But with these minor exceptions the student partici- 
pates fully m the development of the subject and verifies for himself the 
derivations of all theoretical results Furthermore, he has an opportunity 
to follow the development of the basic ideas and to understand the needs 
which led to the formulation of our various statistical concepts 
The data used m the illustrative problems and exercises are fictitious 
unless otherwise noted The use of fictitious data has made it possible to 
illustrate the principles of statistics with a minimum of time devoted to 
extended computations 

The book is intended for use m a single semester introductory course of 
three or four semester hours For use m a three-hour course, the following 
omissions are suggested Chapter 2, Article 2, Chapter 3, Articles 6 and 11, 
Chapter 5, Articles 4, 5, and 11; Chapter 6, the technical details of Article 4, 
Chapter 7, Articles 5, 6, 7, and 8; Chapter 8, Article 9, Chapter 9, Second 
Method of Article 11 and all of Article 13, Chapter 11, Article 3 
I wish to record my gratitude to Professors George Starcher and Carl 
Denbow for their criticism of the manuscript, and to Misha Goedicke, 
Mae Simon, and Norma Albaugh for their assistance m criticizing, typing, 
and proofreading I am also indebted to many of my students m Ohio 
University who have worked with the material and particularly to Eugene 
Dunn, Shirley Stevens, and Lawrence Talley, who have checked all the 
exercises for numerical errors 

I am indebted to Professor Ronald A Fisher, Cambridge, to Dr Frank 
Yates, Rothamsted, and to Messrs Oliver and Boyd Ltd , Edinburgh, for 
permission to reprint Table IV from their book Statistical Tables for 
Biological , Agricultural , and Medical Research 

Victor Goedicke 

January , 1958 
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OBJECTIVES 


OF STATISTICS 


1. INTRODUCTION 

“Statistics” is a term somewhat loosely used to mclude various methods 
of presentation, analysis, and mterpretation of mass numerical data Some 
wi iters prefer to use the term “statistical methods” for these various 
procedures and to reserve the term “statistics” for the data itself, but this 
convention will not be followed m this book 

The purpose of this introductory chapter is to explain to you the scope 
and purpose of the branch of mathematics which you are about to study 
For this purpose a verbal definition of the word “statistics” is probably 
of little use. A more effective procedure is to examine a set of representa- 
tive situations which require statistical methods of attack and to see for 
yourself what statistics means in practice and what it is used for. The 
presentation of such a set of statistical problems is the primary task of 
this chapter 

A preliminary orientation of this sort is desirable m any field of mathe- 
matics, but the need is probably greatest in statistics The transition from 
the objectives of other mathematics courses to those of statistics is not 
an easy one, and many students soon find themselves floundering hope- 
lessly, with a feelmg that they have missed the central idea Their specific 
comments are likely to describe one or the other of the following impres- 
sions 

(1) Statistics is a grab-bag of unrelated techmques, without an under- 
lying core of common principles and without a continuous thread of 
mathematical development 

(2) The “results” of statistical study are vague and hard to grasp, 
the methods rarely give you a flat “yes” or “no” answer to any problem, 
but only a diffuse statement about likelihoods or probabilities 

It is the author’s hope that a careful reading of this chapter will give 
you a little insight into the purposes of statistical analysis and help you to 
avoid this typical feeling of disorientation. 
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2. NATURE OF STATISTICAL DATA 

In most fields of mathematics we deal with a few specific numbers, each 
of which is considered to be exactly known and to have a specific relation 
to the other numbers involved m the problem In statistics we usually 
deal instead with a collection of numbers, all measuring m some way the 
same thing and each containing some element of uncertainty We then 
draw conclusions from the trend of the data as a whole, rather than from 
any individual items m it Furthermore, if two variables are involved, 
we do not assume that a specific relationship exists between them, instead 
we frequently find that the second variable is only partly controlled by 
the first and is partly independent of it Indeed, it is frequently the duty 
of the statistician to measure the amount of relatedness of the two variables, 
that is, to separate the part of one variable which is controlled by the 
other from the part which is independent of it 

Suppose, for example, that a scientist applies a varying voltage across 
a circuit, with the purpose of finding how the voltage affects the amperage, 
and obtains the following results 


Trial No Voltage 

Amperage 

1 120 

24 

2 110 

22 

3 100 

20 

4 90 

18 

5 100 

20 

6 110 

22 

He would conclude that, for this circuit, 

amperage is proportional to 

voltage Since this law works exactly for all his measurements, he can 
further conclude that m this situation the amperage is controlled solely 
by the voltage There is no element of uncertainty m the figures, and 
statistical analysis would add nothing to his knowledge 

Now let us suppose instead that the apparatus is subject to an irregular 
change of temperature and that the resistance of the circuit, unknown 

to the scientist, changes with temperature, 
follows 

so that the measures are as 


Table 1-2-1. 

Voltage and Amperage 

Trial No 

Voltage 

Amperage 

1 

120 

23 8 

2 

110 

22 1 

3 

100 

19 7 

4 

90 

18 0 

5 

100 

20 2 

6 

110 

22 2 
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The principle that amperage is proportional to voltage now fails ; further- 
more, no law whatever can work exactly, because the same voltage does 
not always produce the same amperage It is evident that some other 
cause or causes, not measured by the scientist, are at work, and the situa- 
tion is now one m which statistical analysis can add to the scientist’s 
knowledge 

But, you may ask, “Why doesn’t the scientist study, m turn, all the 
possible causes of the uncertainty'’” For example, why doesn’t he keep 
every other factor constant and vary first the temperature alone, then the 
barometric pressure alone, and so forth, to see whether the amperage 
changes or not with each of these other factors? In short, why does he not 
isolate and eliminate the statistician’s “element of uncertainty” and get 
the experiment back onto the solid and familiar ground of exact measure- 
ment? 

This is, of course, exactly what the scientist does when he can. It is 
better to isolate and to measure the effect of each variable separately, 
by controlling the experiment, than it is to begin by measuring a quantity 
which is controlled by a hodgepodge of causes all acting simultaneously 
and then to try to separate them later by mathematical analysis Un- 
fortunately the scientist is not always able to control his experiment 
completely 

The degree of control which an experimenter exerts over his experiment 
varies over a wide range At one end of the range we have the chemist 
or physicist exercising complete or almost complete control over an experi- 
ment in a laboratory, at the other end we have the social scientist studying 
the causes of such things as mass shifts of population m a large city Some- 
where along this continuous range we cease to refer to the study as an 
“experiment” and call it an “observation,” m acknowledgment of the 
increasingly passive role of the mvestigator, but the dividing line is not 
sharp. In all cases the uncontrolled part of the experiment is likely to 
be somewhat larger than it appears at first glance 

Consider, for example, the scientist who is studying the relation between 
voltage and amperage m a complex circuit He can eliminate the effects 
of temperature, barometric pressure, humidity, and every other physical 
variable which occurs to him, but there may still remain one source of 
uncertainty If he wishes to formulate a law which will be accurate to 
four decimals and if his instruments are capable of measuring amperage 
only to three decimals, then his own inability to measure with sufficient 
accuracy constitutes an element of uncertainty, and to make further 
progress he must use statistical tools It is for this reason that statistics 
is becoming increasmgly important m the exact sciences The early in- 
vestigators m any field always discover the easy “knock-down-and-drag- 
out” laws, which are obvious as soon as some rough measures are assembled 
The later investigators must m many cases study the subtler relationships 
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which, were at first overlooked In order to make progress they must 
utilize the full precision of their instruments, and then, having reached 
the limit set by the instruments, they must use statistical methods to 
extract the greatest possible amount of information from the uncertain 
part of their measurements 

In medical research the uncontrollable elements are more obvious It 
is common practice to test the effectiveness of a medicine by dividing the 
patients into two groups which are identical, so far as possible, m average 
age, severity of illness, general health, and so forth, and to give the medicine 
to one group and not to the other But who is to decide when all other 
possible variables are identical for the two groups? Mrs Schmidt and 
Mrs Bartolacci are both 62 years old, weigh 147 pounds, and have three 
children each But Mrs Schmidt worries acutely about her son, who is m 
Eastern Germany, so that her appetite is poor and her interest m her 
treatment is low, Mrs Bartolacci’s son has a fine job m Brooklyn and 
writes to his mother every day Mr Encson and Mr Thomas receive 
identical dinners of poached eggs, toast, and tea Mr Encson thrives 
on this food, but Mr Thomas develops an allergic reaction to eggs and 
breaks out m a severe case of hives, with a mounting temperature! 

Even if two identical sets of human beings could be found, the doctor’s 
freedom of action m varying their conditions is limited He might suspect 
that malnutrition has m the past been a contributing factor m the failure 
of a particular treatment, but he can hardly withhold food from some of 
the patients m order to test his theory And even if he had complete 
freedom of investigation, he could never cope with the sheer number of 
possible contributing factors, which stretch on to infinity Furthermore, 
many of these factors are intrinsically unmeasurable Who can measure, 
for example, the patient’s motivation to recover? 

In practice, the doctor who suspects that malnutrition has had an effect 
on the success of a treatment need not admit defeat simply because humane 
considerations prevent him from starving his patients Instead he can 
test his hypothesis by collecting data from hospitals all over the world 
and taking advantage of the fact that malnutrition unavoidably existed 
m some areas * But the doctor who uses such data must reckon with a 
number of new and irrelevant effects, since the hospitals m malnutrition 
areas can be expected to differ from each other and from American hospitals 
m many ways other than the degree of malnutrition of their patients In 
this situation, all hope of using a “controlled experiment” type of analysis 
must be given up, and the doctor must now attempt to separate the causes 
as well as he can by means of statistical analysis 

*At the end of World War II, for example, some important information about the 
effect of malnutrition on tooth decay was obtamed by a statistical study of the diets 
and the condition of teeth m Italy, where malnutrition had been senous for several 
years 
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3. PROBABILITY IN STATISTICAL RESULTS 

You will find that statistics differs m a second way from the branches 
of mathematics with which you are familiar Statistical analysis fre- 
quently leads you not to a flat statement that a given answer is correct, 
but to a statement that, out of the various possible answers, a given one 
is the most likely to be correct Furthermore, the likelihood or probability 
that it is correct is itself determined exactly by statistical analysis Statis- 
tics is m part the analysis of the degree of uncertainty of results Because 
of this emphasis upon probability, students sometimes begin to feel that 
they are treadmg a quicksand of uncertainty and to wish for the cold 
precision of the familiar “let x equal the unknown number” sort of mathe- 
matics 

A little reflection, however, will show the reader that m dealing with 
human affairs the concept of probability is vital We must frequently 
base important decisions upon incomplete evidence, simply because some 
aspects of the evidence are not available to us Even if all the evidence 
is potentially available, we often do not have time, m our finite lives, to 
scrutinize all possible factors m minute detail, we must observe what 
we can and make the most reasonable possible hypothesis about the 
remainder Having done so, we must be aware of the degree of uncertainty 
contributed to our final decision by the various uncertainties m the separate 
factors, and this can be done accurately only with the aid of a mathematical 
analysis of probabilities The study of statistics will perhaps make you 
more specifically aware of the role played by chance in your affairs, and 
will help you to develop some skill at computing probabilities or “esti- 
mating the odds ” It is to be hoped that after you have studied statistics 
you will be less likely to be caught m an unwise action which depends for 
its success upon the occurrence of demonstrably unlikely events 

Statistics is m large part the science of gambling and is valuable to us 
precisely because human affairs consist of a long series of unavoidable 
gambles We engage in a gamblmg operation when we choose a career, 
when we buy a house or a car, when w r e choose an insurance policy, or 
even w T hen we plant a tree Whether or not to gamble is a choice not left 
to us, we can choose only whether we should gamble blindly or whether 
we should analyze our bets and place them to the greatest possible ad- 
vantage And if we make the second choice, then a mastery of the basic 
laws of probability becomes a necessary tool 

4 . SURVEY OF STATISTICAL PROBLEMS 

A more specific acquamtance with the scope and purposes of statistical 
study is best obtained by an examination of specific problems to which 
the methods are applicable, and the remainder of this section consists of a 
sample set of such problems These will of course not be phrased m the 
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more exact language used later m the book, but their insertion heie will 
introduce you to the basic ideas and give you a background for the moie 
exact formulation to come later 

It is suggested that you answer all these questions to the best of your 
ability, using any common sense methods of reasoning which may seem 
appropriate to you, and that you record your answers for future com- 
parison with the results you will obtain when you solve these same problems 
by statistical methods If you believe that some of the problems do not 
contain enough data to justify an answer, record this statement as your 
answer 

I Table 1-4-1 contains the results of recording the maximum tempera- 
tuies reached during the illnesses of forty patients with a given kind of 


Table 1-4-1 Maximum Temperatures of Forty Malarial Patients 


104 

3 

104 

8 

104 

0 

104 

0 

105 

4 

105 

0 

103 

9 

105 

8 

104 

4 

104 

5 

103 

3 

104 

5 

104 

4 

103 

7 

104 

9 

104 

5 

105 

3 

103 

7 

105 

0 

104 

2 

104 

0 

104 

2 

104 

5 

103 

9 

104 

0 

103 

4 

104 

4 

105 

0 

105 

7 

105 

5 

104 

4 

104 

8 

103 

7 

105 

0 

103 

7 

104 

9 

104 

3 

105 

3 

104 

4 

104 

5 


malaria. How can this data be summarized or presented m a more concise 
form so that a reader can grasp the essential points at a glance instead of 
having to form an impression by studying the forty separate numbers? 

II Table 1-4-2 contains the maximum temperatures of twenty additional 
patients who were given an experimental treatment designed to reduce 
temperature during the illness What are the essential differences and the 


Table 1-4-2 Maximum Temperatures of Twenty Experimental Patients 


104 5 

104 9 

104 5 

104 1 

103 6 

104 6 

104 8 

104 1 

105 1 

104 6 

104 4 

103 8 

104 7 

104 0 

104 8 

103.8 

103 9 

104 7 

104 5 

104 2 


essential similarities between the treated group and the non-treated group? 
What form of description of the two sets of data is best suited for making 
a quick and exact comparison between the two? 

Ill The average temperature for the untreated patients is 104 48, 
while that for the treated patients is 104 38. At first glance this suggests 
that the treatment had some small success m reducing the temperatures 
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of the patients. However, the diffeience is only 0 10, which is very small, 
particularly if we compaie it with the range of 2 50 degrees which exists 
between the smallest and the largest m the untreated group Since the 
temperatures differ so much from one patient to another, it is possible that 
the difference between the two groups exists only because the treated 
group happened accidently to include an unusual percentage of patients 
with tendencies to have slightly lower than average temperatures In 
other words, the difference may be due to chance. Is it reasonable to 
believe that the difference of 0 10 arose from such an accidental selection 
of patients, or is the difference too large to be explained m this way? In 
short, is it more reasonable to believe that the treatment had an effect on 
temperatures or to believe that it did not have an effect? 

IV Although their averages are practically identical, there are never- 
theless important differences between the two sets of numbers. For 
example, there are far fewer extremely high temperatures in the treated 
group than m the untreated group, which might be a more important 
basis for action than a difference between the averages How can we 
best measure these other differences and express them in a convenient 
and concise way? 

V In a wire factory, the wire made by one machine is tested by taking 
fifty wires from its day’s production and measuring the tension required 
to break them The results are shown m Table 1-4-3 


Table 1-4-3 Breaking Strengths of Fifty Wires 


205 

204 

204 

203 

205 

203 

203 

203 

204 

204 

206 

207 

202 

202 

204 

206 

203 

205 

204 

204 

205 

205 

203 

204 

206 

201 

204 

202 

207 

205 

203 

203 

204 

204 

205 

203 

205 

201 

202 

202 

204 

204 

205 

206 

205 

203 

204 

204 

203 

204 


A purchaser wishes to buy a carload of the wire, under a contract which 
guarantees all wire to withstand a pull of 200 pounds, and which permits 
the purchaser to void the contract at any time (at considerable expense to 
the manufacturer) if any wires subsequently tested break under a pull of 
200 pounds Would you advise the manufacturer to sign such a contract 
or not? Would you advise signing if the guaranteed strength were re- 
duced to 195 pounds? 

VI In the above situation, what would your answer be if the contract 
specified instead that the average breaking strength of any wire subse- 
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quently delivered must not be below 202 pounds? Would the contract 
under this stipulation contain any perceptible risk to the manufacturer? 

VII. Suppose mstead that the purchaser offered a contract specifying 
that not more than 1 per cent of any sample subsequently selected from 
the delivered wire should break at 200 pounds Would this be a better or 
a worse contract than the ones described above, from the manufacturer’s 
viewpoint? 

VIII In question VI, do you think that the manufacturer should safe- 
guard himself by specifying m the contract the number of wires to be 
included m the sample to be tested? If so, how large a sample yould you 
recommend? 

IX A man who suspects that he has a serious disease consults a phy- 
sician The physician finds that there is free acid m the man’s stomach, 
and this is one of the characteristic symptoms which is always present m 
the disease However, it is not an absolute symptom upon which a diag- 
nosis can be made, because many people who do not have the disease 
nevertheless exhibit the symptom Specifically, for men m the same age 
group as the patient about 18 per cent of the men who do not have the 
disease nevertheless exhibit the symptom The patient is found to exhibit 
three other symptoms of the disease, for which the corresponding per- 
centages are 12 per cent, 5 per cent, and 9 per cent If the symptoms are 
independent of each other for people not having the disease, how likely 
is it that this man has the disease? Do you feel that you need more in- 
formation to answer the question? If so, what? 

X A university gives all applicants for admission an entrance examina- 
tion To judge the usefulness of the examination, records are kept of the 
grades subsequently made by the examinees m their college courses The 
first column of Table 1-4-4 gives the score made by the examinees on their 
entrance examination, and the second column gives the average grade 
which each man has made m the mathematics courses which he has taken 
durmg the following four years An examination of these two columns 
reveals that a high test score is almost always followed by a high math- 
ematics grade, and a low test score by a low mathematics grade It is 
evident that the test measures some of the abilities which enable students 
to earn high mathematics grades, but it is equally evident that there must 
be other factors present which help to determine the mathematics grade, 
but which the entrance test does not succeed m measuring It follows 
that the entrance scores of new students can be used fairly successfully 
to predict their mathematics grades, but that the prediction will be subject 
to a certain amount of error A number of practical questions arise* 

(a) How can predictions of mathematics averages best be made? In 
particular, if Tom Jackson, who is now applying for admission, makes a 
score of 18 on his entrance test, what mathematics average might he 
reasonably be expected to make m college if he is admitted? 
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Table 1-4-4 Records for Twenty-Five Students 


Entrance 

Test 

Score 

Subsequent 

Mathematics 

Grade 

Subsequent 

Language 

Grade 

Experimental 

Test 

Score 

CE) 

(M) 

(L) 

(X) 

15 

51 

58 

209 

17 

60 

47 

148 

18 

54 

51 

181 

18 

54 

77 

239 

20 

64 

52 

184 

21 

49 

63 

214 

21 

60 

61 

217 

22 

66 

54 

162 

22 

65 

70 

224 

22 

6S 

77 

214 

23 

67 

90 

231 

23 

64 

59 

171 

24 

76 

76 

218 

24 

68 

82 

208 

24 

74 

83 

217 

24 

69 

66 

182 

25 

80 

74 

200 

25 

96 

49 

147 

27 

84 

98 

188 

30 

95 

79 

183 

30 

81 

84 

184 

30 

89 

68 

158 

31 

94 

88 

163 

31 

78 

74 

174 

32 

88 

93 

218 


(b) How can the accuracy of this piedietion be measured? The two 
men who also scored 18 both made subsequent averages of 54, which 
is below passing Since Jackson also scored 18 on his entrance test, it is 
reasonable to predict that he also will fail But the prediction is not 
absolute, for example, the second man on the list scored only 17, which is 
lower than Jackson’s score, but he nevertheless made a subsequent math- 
ematics average of 60, which is a passmg grade If Jackson has three 
chances m ten of passmg and wishes very much to try, perhaps he should 
be admitted But if his chances are only one in a thousand, it would be 
foolish to admit him Exactly what are his chances of obtaimng a math- 
ematics average of 60 or higher if he is admitted? 

(c) The third column gives the subsequent grades which the admittees 
made m language courses A comparison of these grades with the entrance 
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Figure 1-4-1. Distances of Baseball Throws by 303 High School Girls. 
(Based upon data by Leonora Steward and Helen West, the Froebel 
School, Gary, Indiana Reprinted by permission of Prentice-Hall, Inc., from 
"Applied General Statistics" by Croxton and Cowden, Copyright 1939 
by Prentice-Hall, Inc.) 



Figure 1-4-2. Percentage of Dry Matter in 160 Mangel Roots (Based 
upon data from "The Combination of Observations" by David Brunt, by 
permission of the Cambridge University Press.) 



Figure 1-4-3. Scores of 206 Freshmen on the Thorndyke Intelligence 
Test. (Based upon data from "Statistics in Psychology and Education" by 
Henry E. Garrett, by permission of Longmans, Green, and Co. Copyright 
1947 by Longmans, Green, and Co , Inc ) 
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scores shows that a high entrance test score frequently but not always goes 
with a high language grade, and vice versa. The entrance test, m other 
words, is not a very successful measure of the abilities which enable students 
to make high grades m language, although it is not altogether unsuccess- 
ful either Can you devise a numerical measure for stating exactly how 
much better the entrance test is for predicting mathematics grades than 
it is for predicting language grades? 

(d) The entrance officials have for some time been dissatisfied with the 
entrance test being used m the university and have considered replacing 
it with a test of a somewhat different type, m the hope of securing more 
accurate predictions of subsequent performance This second test was 
given to the same twenty-five men who took the legular test, with the 
objective of comparing the two tests critically and choosing the better 
one for future use as soon as enough subsequent data about the twenty- 
five men became available The scores on the experimental test are shown 
m column foui of Table 1-4-4 Which is the better of the two tests for 
piedicting mathematics grades? Is it also better for predicting language 
grades? 

XI. A dealer who purchases a consignment of dice suspects that they are 
defective To make a quick test he throws one of the dice twenty-four 
times and finds that a 6 came up seven times, a 5 twice, a 4 three times, a 3 
four times, a 2 once, and a 1 seven times If the die were perfect, one 
would of course expect that all faces would turn up approximately four 
times apiece in twenty-four trials It would not be reasonable, however, 
to expect each face to turn up exactly four times, because of the presence of 
the element of chance But how large a deviation from this expected fre- 
quency of four is to be regarded as normal? In other words, how large 
must the deviations from the expected frequency of four become m order 
to convince us that the die is defective 9 In particular, from the evidence 
presented above would you conclude that the die is certainly defective, 
or probably defective, or probably not defective? 

XII A surveyer has found by experience that his measurements of 
angles are frequently in error by more than one minute of arc but that if 
he measures an angle five times and averages the results, the average will 
almost always be within one minute of the true value He undertakes a 
new assignment in which the angles must be measured with sufficient 
precision so that errois rarely exceed one-half of a minute of arc. Can be 
achieve this precision by increasing the number of times he measures the 
angle? If so, how many measures should he make to insure the required 
accuracy? 

XIII All the preceding problems have been of practical nature, that 
is, they have illustrated the use of statistical analysis as a tool for gaining 
insight into practical problems The following question is of an entirely 
different nature, it is concerned instead with the theory of statistics 

Figure 1-4-1 is a diagram which shows the distances which 303 freshman 





INTRODUCTION TO THE THEORY OF STATISTICS 


[CH I 


high school girls m Gary, Indiana, were able to throw a baseball The 
height of each column indicates the number of girls whose throws were 
between the limits given across the bottom of the diagram Thus, reading 
from the left, the diagram tells us that only one girl threw less than 25 
feet, two girls threw farther than 25 feet but less than 35, seven threw 
farther than 35 but less than 45, and so forth (Such a diagram is called 
a histogram and is a useful and widely used tool m statistics The con- 
struction and use of histograms will be discussed m Chapter 2 ) 

In Figure 1-4-2 the same sort of representation is used to show the 
percentage of dry matter m a sample of 160 mangel roots Two of the 
mangel roots contained between 10 and 11 per cent dry matter, seven 
contained between 11 and 12 per cent, and so forth Figure 1-4-3 shows 
the scores of 206 freshmen examinees on an intelligence test 

As one examines these three figures, one is struck by the fact that there 
is an unmistakable similarity m the fundamental nature of the three curves. 
The width of the vertical bars is different m each case, but this is determined 
by the range which the tabulator chooses to include m each group, and 
has nothing to do with the physical nature of the distribution To help 
eliminate this extraneous factor, the author has sketched a smooth curve 
m each figure, which follows the general trend of the tops of the bars We 
see that m all three cases the curve is convex upward m the center and that 
it changes to convex downward on each side of the central peak, then 
levels off to a horizontal direction as it reaches the zero line Other ex- 
amples will show that the same sort of curve is obtained when we plot 
such diverse things as the sizes of the eggs of a certain marine snail m 
Greenland, the neck sizes of male college students, the batting averages of 
baseball players, and many other types of data These curves are so 
similar that a single equation can be used to describe all of them 

This similarity raises some interesting questions Is there a hidden 
essential similarity between the “forces” governing the distribution of 
intelligence among American college students and those governing the 
egg sizes of marine snails m Greenland? If so, what are the “forces” and 
how do they operate? For any new collection of data, can we tell m 
advance whether we should expect this same type of distribution or not? 

5. SCOPE OF STATISTICS 

The list of human activities and interests m which statistical thinking 
plays a part is a very long one The following outline will suggest to the 
reader the variety of the fields in which statistical procedures are being or 
can be used 

I. The Exact Sciences 

We have already indicated that the physicist, chemist, or astronomer 
uses statistical analysis to estimate the amount of uncertainty present in 
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measurements and to trace the effects of this uncertainty through his 
subsequent computations and conclusions But many further uses can 
be cited For example, the physicist frequently deals with problems in- 
volving the behavior of things too numerous to study individually, such 
as the separate molecules in a gas In such cases he uses statistical formu- 
lations based upon the probability that a given molecule will have a given 
velocity, and with the aid of this treatment he is able to predict the be- 
havior of the entire population of molecules, that is, of the gas as a whole 
There are individual molecules with erratic or exceptional velocities, just 
as there are individual human beings whose behavior is unusual, but so 
long as we are able to measure the probabihty that a given individual atom 
or human being will have a given type of behavior, we can allow for the 
net effect of these mavericks 

In the last few decades there has grown up a fascinating new branch of 
modem physics called quantum mechanics, which has dramatically over- 
come the long-standing problem of the fundamental nature of light and 
matter which has bedeviled physicists for centuries In this new interpre- 
tation, light and atoms merge into one entity, which can only be described 
as “probability density/’ or probability (per unit volume) that the atom 
or photon will be found to be occupying a given position m space A light 
wave is a wave of this “probability density,” and so is a movmg atom r 
The need for advanced statistical theory here has been so urgent that the 
physicists have surpassed the statisticians in developmg some specialized 
branches of the subject 

II. The Biological Sciences 

The usefulness of statistics m these fields is so obvious that a single exam- 
ple will suffice If an agricultural research worker measures the yield of an 
experimental variety of com, he is observing the results of a common 
factor (the genetic properties of the variety) overlaid by a number of 
random factors which differ for each plant (soil conditions, availability of 
water and sunlight, and so on), and he must use statistical methods to 
isolate and measure the common factor, m which he is primarily interested 
This combmation of a constant ingredient ■with a random ingredient is so 
nearly universal m these fields that some knowledge of statistics is a basic 
necessity for all -workers m biology, zoology, botany, livestock breeding, 
agricultural experimentation, and so forth 

III. The Social Sciences 

The situation here is much the same as for the biological sciences; almost 
every quantity which the investigator can measure is affected by a 
number of factors, some of which are always random with respect to the 
other variables in the study If a sociologist wishes to study the con- 
nection between poverty and crime, he can readily collect data for a number 
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of communities showing the distribution of incomes and the incidence of 
crime, but m interpreting his data he is faced with the fact that the socio- 
logical pressuies behind each individual crime were very complex and 
included many factors besides that of poveity If a teacher tries to 
evaluate the effectiveness of a given textbook, he can test the students 
who used the book and compare their proficiency with that of students 
who used other books instead But he must reckon with the fact that a 
certain amount of landom variation from class to class is to be expected 
as a result of chance, and he must be sure that any observed difference is 
too large to be explained by chance before he can conclude that the text- 
book has made a difference 

An example of a statistical approach is to be found m the December, 
1949, Atlantic Monthly , m an article which analyzes the attitudes of the 
members of the Supreme Court m cases involving alleged violations of 
civil rights The author points out that there are some apparent con- 
tradictions in attitude among the Justices, and goes on to say 

“ Does this not make it plain, some member of the bar may ask, 
that nothing of value is to be concluded from grouping cases and aligning 
Justices for statistical leview? 

“The question should answer itself Obviously not much is to be told 
by only three cases But a great deal may be learned from a study of 
twenty times three Who would say that since three roll calls give little 
insight into the voting attitudes of a member of Congress, sixty roll calls 
would be no more informative? 

“Justices generally do not like to be put m tables They may prefer to 
think that the work they do cannot be counted up in columns of figures 
No impertinence is intended m the suggestion that they had better begin 
to get used to some numerical analysis Legal scholars are using it and 
doubtless they are going to use it more and more ”* 

The data referred to m this passage is used as illustrative material for 
problems in Articles 7 and 8 of Chapter 1 1 

IV. Industry 

There are a number of more or less unrelated uses for statistics in in- 
dustry A few of them are as follows: 

A Production Control and Analysis An industrial machine 
usually turns out a product which is almost uniform, but which neverthe- 
less contams a small random variation of size, or weight, or strength, or 
color. If the product is destined for use m precision machinery, this small 
variation must be studied and controlled 

B Economic Forecasting When a business is to be established or 

^Reprinted from “Truman Reshapes the Supreme Court,” by Irvmg Dilliard, 
December, 1949, by permission of the Atlantic Monthly and Mr Dilhard Copyright 
by the Atlantic Monthly, 1949. 
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a factory built, the owners must frequently commit themselves to an 
expenditure which can be recovered only by twenty or thirty years of 
successful operation They must therefore not only know the present 
demand for the product, availability of raw materials, and transportation 
facilities, but they must also make the most reasonable possible prediction 
of the futuie course of these factors When a life insurance company fixes 
its rates, it is making a foiecast concerning the average longevity of the 
men m each group who pass the physical examinations This forecast 
cannot be based merely upon the assumption that longevity of any group 
m the future is gomg to be the same as the longevity of a similar group 
has been in the past, it must be based also upon the presence of various 
changing factors in medicine, nutrition, and general health 

C Personnel Statistical studies concerning employees are under- 
taken for a wide variety of reasons Abilities aie sometimes measured by 
standard tests, and the results are used to assign the employees to other 
tasks where their abilities will be used more effectively A recent study 
was made by a trucking company which indicated that many of the drivers 
had far more accidents than they were entitled to “on the law of averages”; 
m other words these duveis had at any time a higher probability of having 
an accident than other driveis The company saved many lives (and, of 
course, saves themselves much money in damages) by simply reassigning 
these employees to non-driving duties We do not know which of these 
drivers would have had fatal accidents, and undoubtedly many drivers 
were transferred who would have been lucky enough to avoid further 
accidents But as a group these drivers were accident-prone, and the 
accident rate went down after their transfer 
D Public Relations Many businesses, particularly those involved 
in selling directly to the public, can improve their effectiveness by study- 
ing public opinion and modifying their product or their approach to meet 
the wishes of their customers. In many industries the public opinion 
poll is a matter of course For example, some manufacturers m the auto- 
mobile industry recently polled the public about their wishes concerning 
various details of automobile design, such as body size, amount of chromium 
trim, and quality of upholstery Another example is that of the motion 
picture industry In this industry it is obvious that the customer has no 
first-hand information about the quality of the product until after he has 
purchased it, that is, until he has seen the picture At the time of his 
decision to see it, he frequently knows only such things as the name of the 
picture and the leading actors or actresses in it Statistical studies have 
shown that the name of the picture has a strong effect on the attendance, 
and these studies have led to the formulation of general principles which 
must be followed m choosmg profitable names for new films (The author 
is tempted to add the slightly acid remark that the studies have evidently 
persuaded some manufacturers that it is better to produce a good name 
for a picture than it is to produce a good picture ) 
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V. Literature and Arts 

It is a little surprising to find statistical methods used in studies of 
literature, where one might expect the material to be totally unsuitable 
for numerical measurement of any kind However, studies have shown 
that such things as the pattern of frequency of use of various words or 
frequency of various lengths of sentences are remarkably stable and 
remain more or less the same for any given author Several questions of 
disputed authorship have been settled by a study of the frequency dis- 
tribution of lengths of sentences Recently the frequencies of use of 
various words have been tabulated for a large number of literary (and 
non-literary) works, and a very remarkable general law has been found 
to describe all these distributions * 

VI. Statistics for the Layman 

For a man who is not engaged m any of these fields, is there any purpose 
m studying statistics? There are several Every man is a taxpayer, or 
a voter, or an insurance purchaser, or an investor m real estate, or a pur- 
chaser of consumer goods Many laymen are interested m consumer 
cooperative groups of various sorts All people have a financial interest 
m the economic trends m their communities Everyone is subjected to 
advertising which uses (and sometimes misuses) statistical data or reason- 
ing And we all live m an age of political propaganda, m which news from 
all parts of the world has been “slanted” m such a way as to affect our 
opinions We must learn to discount the interpretation which is put upon 
the data by people whose interests are at stake and try to discern for 
ourselves the justifiable conclusions At the time of this writing the 
American people are being subjected to a set of arguments concerning the 
desirability or undesirability of what is generally called “socialized medi- 
cine ” As interpreted by one side, the data prove that the American 
people are now receiving adequate medical care, while the other side uses 
the same data to prove that medical care for some income groups is grossly 
inadequate In self defense the ordinary citizen must equip himself to 
draw his own conclusions f 

And, for one last point, the ordinary citizen needs a little statistical 
knowledge to protect himself against amateur statisticians f We have 
commented upon the misuse of statistics by the unscrupulous, but it is 
also misued by the inept. It is a field m which a little knowledge is a 
dangerous thing, and many an unwarranted conclusion is urged upon the 
unsuspecting public m the name of statistics This topic will be discussed 
at some length in Chapter 13, which deals with some of the particular 
pitfalls of statistical logic 


*See, for example, G K Zipf, Human Behavior and the Principle of Least Effort . 
Addison-Wesley Press Cambridge, Mass , 1949. 
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6. HOW TO STUDY STATISTICS 

As in any branch of mathematics, the process of learning statistics con- 
sists chiefly m operating with it A procedure which is only descnbed to 
you will be vague and shadowy, but if you simultaneously apply it while 
you read, you will master it more quickly and at a deeper level Study 
with plenty of paper and pencils at hand, and carry out for yourself the 
mathematics of all derivations, supplying all the missing steps Then 
carry out for yourself the computations m the illustrative examples, using 
the figures in the textbook only as a check upon your own work 

The theoretical derivations in the book form an interlocking senes in 
which a great deal of space is saved by cross references to previous results 
Each equation to which reference is made is numbered m such a way as to 
indicate the chapter and the number of the article in which it occurs, and 
the task of locating the equation is further facilitated by the fact that the 
chapter and article numbers are given at the head of each double page 
throughout the book You can therefore refer to any equation with little 
loss of time 

For quick reference to frequently used equations a table is provided in 
Appendix VII Perhaps you will find it advantageous to mark this and 
other tables in the back of the book with tabs to make them readily ac- 
cessible 

At the end of each chapter is a summary of the operational procedures 
described in the chapter These summaries will be useful for leview of 
the procedures and their interpretations Also, for the inexperienced 
mathematician who finds himself occasionally earned beyond his depth, 
these summaries will provide a means of following the thread of the opera- 
tional aspects of the subject. 
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1. INTRODUCTION 

The raw data with which a statistician begins his work usually consists 
of a large number of more or less related mdiviudal measurements or other 
numerical information The investigator’s first task is to organize or 
summarize this body of data, that is, to reduce it to a form in which its 
essential character can be perceived and m which it can be compared 
quickly and accurately with other similar sets of data This organizing 
operation is a necessary preliminary to any further statistical analysis of 
the data which might be undertaken 

The basic procedure for accomplishing this objective consists of group- 
ing togethei the quantities which are alike or nearly alike, counting the 
number in each group, and tabulating the results of the count m a standard 
form. Such a count is called a frequency tabulation 
When such a tabulation has been made, it is often useful to express the 
results in graphical form Two graphical devices for this purpose, with 
slightly different uses, are the histogram and the ogive The construction 
and the uses of frequency tabulations, histograms, and ogives will form 
the chief content of this chapter 

2. PROCEDURE FOR MAKING TABULATION 

The procedure of foimmg a frequency tabulation can be demonstrated 
most quickly by means of an example Table 2-2-1 consists of a list of 
forty numbers which represent the breaking strengths of a sample set of 


Table 2-2-1 Breaking Strengths of Forty Wires 


204 

201 

207 

206 

208 

207 

208 

203 

206 

207 

202 

204 

205 

207 

202 

206 

203 

207 

205 

204 

203 

208 

203 

206 

205 

205 

207 

205 

206 

207 

204 

206 

206 

205 

207 

203 

207 

205 

206 

206 
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wires taken from a day’s production of the machine. An inspection of the 
table shows us that there are only eight different numbers represented, 
from 201 to 208 inclusive We begin by listing these eight numbers verti- 
cally, beginning with 201 and continuing to 208 The first number in the 
table is 204, and we enter this number m our tabulation by placing a 
checkmark opposite 204 We continue m this way until all the numbers 
are entered in the tabulation, as shown in Table 2-2-2 The last step of 


Table 2-2-2 Frequency Tabulation 


Breaking Strength 

Tally 

Frequency 

(*) 


(f) 

201 

1 

1 

202 

II 

2 

203 

WrT 

5 

204 

mi 

4 

205 

4 -H+ 11 

7 

206 

llll 

9 

207 

4-H+ mi 

9 

208 

III 

3 


the operation consists of finding the total of the tally for each line and 
entering it at the right m the column headed “frequency.” 

In future discussions it will be convenient to have a standard ter- 
minology to describe the elements of this procedure The original numbers 
are called variates The symbol x x stands for the first variate m the list 
(204 in our case), x 2 stands for the second vanate (206), and so forth, 
while the letter x, without a subscript, stands for any vanate. Each group 
of variates m the frequency tally is called a class , and the number of variates 
in any class is called its frequency , for which / is a standard abbreviation. 
The sum of all the frequencies (i e , the total number of vanates) is custo- 
marily denoted by the letter N 

PROBLEMS 

1 Form a frequency tabulation of the data in Table 1-4-3. 

2 In Problem 1, what is the numerical value (a) of x 3 ? (b) of 

3. BOUNDARIES AND LIMITS 

In general, if we group only identical numbers together into classes, 
there will be too many classes and too few variates m each class This is 
true, for example, for the data in Table 2-3-1 In this table the tempera- 
tures range from 98 1 to 99 9, so that we would lequire nineteen classes 
if we let each separate temperature constitute a class by itself Some of 
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Table 2-3-1 Temperatures of Forty Patients 


98 4 
98 6 

98 8 

99 0 
99 3 


99 0 
99 0 
98 8 

98 6 

99 7 


98 1 

99 5 
99 3 
99 3 
98 3 


98 6 

99 3 

98 7 

99 9 
99 0 


98 9 

98 8 

99 4 
99 1 
98 1 


98 8 

98 9 

99 5 

98 8 

99 3 


99 5 
98 9 

98 5 

99 3 
99 2 


98 5 
98 1 
98 9 
98 6 
98 5 


these classes would be empty, while others would contain only one or two 
entries, and the tabulation would fail to fulfil its objective of presenting 
the data m a quickly graspable form 

In such a case the tabulation will be more informative if we group 
together a small range of variates which are nearly alike, instead of group- 
ing together only those which are exactly alike For these forty tempera- 
tures, for example, we can obtain a useful tabulation by grouping together 
all temperatures within a range of 0 3 degrees, as shown m Table 2-3-2 


Table 2-3-2 Temperatures of Forty Patients 


/ 


98 

0 

to 

98 

2 

3 

98 

3 

to 

98 

5 

5 

98 

6 

to 

98 

8 

10 

98 

9 

to 

99 

1 

9 

99 

2 

to 

99 

4 

8 

99 

5 

to 

99 

7 

4 

99 

8 

to 

100 

0 

1 


From an inspection of this table, it is obvious that the nature of the dis- 
tribution has been made more clearly apparent than it would have been 
if we had assigned only identical temperatures to each class 
Agam, a standard terminology is useful m discussing the procedure 
The smallest and the largest values which are to be included m any class 
are called the class limits In Table 2-3-2, for instance, 98 9 is the lower 
limit of the fourth class, and 99 7 is the upper limit of the sixth class 
If we are interested in precise results, it is necessary to consider care- 
fully the exact extent of each of the classes It is reasonable to assume, 
unless we have evidence to the contrary, that each of the temperatures 
recorded in Table 2-3-1 has been rounded off to the nearest tenth of a 
degree, that is, that the observer has recorded, in each case, the tenth of a 
degree which is nearest to the actual reading of the thermometer If, for 
example, the mercury level is between 98 5 and 98 6, but closer to the 
latter, then the recorded temperature will be 98 6 and the variate will fall 
m the third class If, on the other hand, the mercury is closer to 98 5 
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than it is to 98 6, then the recorded temperature will be 98 5 and the variate 
will fall m the second class The dividing line between the second and third 
classes is therefore halfway between 98 5 and 98 6; it is in other words 
exactly at 98.55 Such dividing lines are called class boundaries The 
value 98 55 is the lower boundary of the third class, it is also the upper 
boundary of the second class It is important to notice the distinction 
between limits and boundaries The boundary between two classes is 
always midway between the upper limit of one class and the lower limit 
of the following class 

The midpoint of any class is called the class mark It is the value midway 
between the lower limit and the upper limit of the class. For example, the 
class mark of the second class is 98 4, while that of the third class is 98 7 
The width of each class, from boundary to boundary, is called the class 
interval, and is customarily denoted m equations by the symbol C In the 
above example, the class interval is 0 3 A list of these terms, with their 
standard abbreviations and instructions for computing them, will be 
found in the summary at the end of the chapter 

PROBLEMS 

1 Form a frequency tabulation of the forty temperatures in Table 1-4-1, 
using a class interval of 0 5, and starting with 103 0 for the lower limit of the 
first class Form a similar frequency tabulation* of the twenty temperatures 
m Table 14-2 Describe m words the chief similarities and the chief differences 
between these two distributions, as revealed by the tabulations 

2 What is the upper limit of the fifth class m Table 2-3-2? What is the upper 
boundary of this class? What is its class mark 9 What is the lower boundary of 
the sixth class? What is the lower limit of the sixth class? 

3 What is the upper boundary of the first class m Table 2-2-2? What is the 
class mark of this class? 

4. GRAPHING OF FREQUENCY TABULATIONS 

The inspection of a distribution, or the comparison of one distribution 
with another, can be facilitated greatly if we present the distributions in 
graphical form The simplest and most widely used graphical form is the 
histogram, which consists of a plot m which the horizontal scale represents 
the values of x of the various classes, and the vertical scale represents the 
frequencies of these classes 

The construction of a histogram will be illustrated with the data in 
Table 2-3-2 We begin by laying off a horizontal scale running from about 
97.5 to 100 5, or enough to include all the classes We then locate on this 
scale the points corresponding to the boundaries of the classes, at 97 95, 
98 25, 98 55, and so forth These points divide the scale into segments, 

*It is suggested that you retain these tabulations for use m future problems 
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Figure 2-4-1. A Histogram. 

each of which corresponds to one class Upon each such segment we 
erect a column whose height gives us the frequency of the class The 
completed histogram is shown m Figure 2-4-1 It is important to notice 
that the base of each column consists of that portion of the :c-scale corre- 
sponding to the interval between the upper and lower boundaries of the 
class, rather than its upper and lower limits 
A distribution is sometimes represented graphically by means of a 
frequency polygon , which is similar to a histogram except that a smgle point 
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is plotted for each class, at the class mark, and these points are then con- 
nected by straight lines An example is shown m Figure 2-4-2, based 
upon the same data as the preceding figure Of the two, the histogr am is 
more widely used, perhaps because it emphasizes the fact that the loca- 
tion of the individual variates inside the class is no longer specified when 
the data is m the form of a frequency tabulation. 

PROBLEMS 

1. Draw histograms of the distributions shown m Tables 1-4-1 and 1-4-2, using 
the tabulations which you made m the first problem of the preceding section Do 
these graphs add to your knowledge of the differences between the two distribu- 
tions ? 

2 Draw frequency polygons of these two distributions 

3 Make a histogram of the data m Table 2-2-2 (Note that the boundaries 
between classes should be placed at 200 5, 201 5, and so forth ) 

4 Collect a set of data from your own observations, form a frequency tabula- 
tion from it, and construct a histogram Retain this data for future practice 
exercises If you cannot think of a source for suitable data, read the following 
suggestions Ask each of twenty students how much money he has in his possession 
at the moment, or how many cigarettes he smokes per day, or what his grade 
average is, or what his weight and height are Measure with a millimeter ruler 
the lengths of twenty or thirty leaves chosen at random from a single tree Record 
the number of yards gamed by your team on each play during a quarter of a foot- 
ball game Attend a target shooting contest and record the number of shots in 
each zone of the target Record the number of spades, or the number of cards 
higher than a ten, m each hand you hold during a game of bridge (If you collect 
two or more items of information about each individual, such as length and width 
of tree leaves, your data will be useful also for practice in the theory of correlation 
m Chapter 9.) 


5. THE CHOICE OF CLASS INTERVAL 

In the instructions given above, no mention has been made of the 
problem of choosing a class interval of proper size. This is a complex 
question for which there is no single answer Let us begin by inspecting 
the results of an experiment m which a tabulation is repeated with widely 
different class intervals The experiment has been carried out with the 
data m Table 2-3-1, and the results of the experiment are shown m Figure 
2-5-1 At the top of the figure is a histogram with class interval 01 It is 
obviously of little use because it fails to accomplish the function of a histo- 
gram, namely, that of organizing the data so that the properties of the 
distribution are apparent upon inspection As presented in this upper 
histogram, the data are nearly as disorganized as they were m the original 
list of forty temperatures! In the second histogram the interval has been 
increased to 0 2, and the histogram shows a little more structure, although 
the frequencies still jump up and down in an erratic fashion as we go from 
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Figure 2-5-1 Effect of Varying Class Interval. 

one class to the next. In the third histogram, the interval is 0 4, and 
the frequencies no longer oscillate, but climb steadily to a maximum and 
then declme again, indicating a well-defined structure for the distribution 
In the fourth histogram, the interval is 0 8, and the histogram now con- 
sists of three large classes The classes are now so wide that it becomes 
important to know where the variates are located inside each class, but 
the histogram does not tell us We feel mtuitively that the third of these 
four histograms contains more useful information than the others 
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But in exactly what sense is the third histogram more “informative” 
than the others? It certainly does not contain more factual information 
about the original forty numbers m Table 2-3-1 As a description of 
these numbers, the first histogram is better than any of the others, since 
it locates the variates more exactly, and the set of original data is better 

Still T 

This question can be answered if we define our objective more exactly 
We are not so much interested m describing this particular set of forty 
temperatures of malai lal patients as we are m desci ibing the distribution of 
temperatures to be expected from such malarial patients m general The third 
histogram shows the general structure of the distribution, which we 
would expect to find m any group of patients similar to these and which we 
could use m predicting the percentage of future patients whose tempera- 
tures will reach various levels The first histogram, on the other hand, 
contains accidental ups and downs which are probably characteristic only 
of the experimental group of forty patients and which we would not expect 
to see repeated for future patients The fact that we intuitively regard the 
third histogram as superior to the first indicates that our intuitive objective 
is to go beyond our particular data and draw conclusions about the larger 
body of data from which it came 

This important distinction is customarily described by the formal 
terms “universe” and “sample ” The universe is the sometimes hypo- 
thetical collection of all possible measures, real or potential, of the phe- 
nomenon m question, while the sample is a relatively small group of actual 
observations selected by some random process from the universe A little 
reflection will show that the objective of the investigator is almost always 
to draw conclusions about the universe rather than about the sample Doctors 
who perform autopsies upon dead cancer patients are not primarily in- 
terested m drawing conclusions about dead patients, they are more likely 
to be looking for information which is true of all cancer patients, so that 
it can be used m diagnosis oi treatment The agricultural research man 
who studies the yield of an experimental new vanet} r of com is interested 
m drawing conclusions which will also be true about all of the com of this 
variety which might m future be grown m any comparable agricultural 
region 

From this point of view, the purpose of forming a frequency tabulation 
and drawing a histogram is to remove the special accidental peculiarities of 
the sample and to leave for demonstration the more stable properties 
which we believe to be characteristic of the universe In the light of this 
objective let us formulate a rule-of-thumb for selecting an optimum class 
interval We should choose for the class interval the smallest value which 
will give us a fairly smooth histogram, without excessive jumps from one 
frequency to the next It is obviously good strategy, if you are m doubt, 
to use a small class interval in the initial tabulation, for if you decide later 
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that it is too small, you can simply add your frequencies in pairs and thus 
obtain a tabulation with an interval twice as large without retabulating 
the original data 

In anticipation of a more detailed discussion of this topic later m the 
book, it is profitable to consider here the form which our histogram would 
take if an infinite universe were available for study, instead of only a 
relatively small sample We could then make the class interval as small 
as we pleased, and we would still be assured that there would be enough 
variates in each class so that its frequency in comparison with the neighbor- 



Figure 2-5-2. Effect of Increasing Sample Size. 


ing classes would be fairly stable, that is, that it would not be affected much 
by the accidental “give-and-take” of a few cases As we made the class 
interval smaller and smaller, the tops of the rectangles would become 
narrower and narrower, and the contours of the histogram would ap- 
proach closer and closer to a smooth, rounded curve, as shown in Figure 
2-5-2 Carrying this argument to its limit, we see that an infinite universe 
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could be represented by a smooth curve From this point of view, we 
can state our conclusions in another way In forming a histogram, we 
should choose a class interval of such a size that the general trend of the 
tops of the rectangles gives us the best possible approximation to the 
hypothetical smooth curve w T hich w T ould represent the universe In many 
cases the investigator, after constructmg a histogram, sketches a smooth 
curve over it in order to show what he believes the distribution of the 
universe to be like. Such sketches are shown in Figures 1-4-1, 1-4-2, and 
1-43, page 10 

PROBLEMS 

1 Table 2-5-1 is based upon admissions of dementia praecox patients to the 
Philadelphia General Hospital Construct several histograms, usmg a different 


Table 2-5-1 Ages of 199 Dementia Praecox Patients upon Admission* 


Age 

No 

Age 

No 

Age 

No 

Age 

No 

15 

1 

24 

12 

33 

2 

42 

3 

16 

3 

25 

15 

34 

8 

43 

0 

17 

3 

26 

7 

35 

5 

44 

4 

18 

6 

27 

9 

36 

10 

45 

4 

19 

6 

28 

8 

37 

5 

46 

2 

20 

9 

29 

10 

38 

2 

47 

1 

21 

8 

30 

11 

39 

5 

48 

1 

22 

8 

31 

10 

40 

4 

49 

2 

23 

5 

32 

7 

41 

2 

50 

1 


*Based upon data from P tactical Clinical Psychology by Dr. Edward Strecker and 
Dr. Franklin Ebaugh, by permission of P Blakiston and Son Copyright by P Blakis- 
ton and Son 


class interval for each Which class interval, m your opimon, is best for the 
purpose of demonstrating the relationship between the age of the patient and the 
incidence of dementia praecox? 

6. OGIVES AND THEIR USES 

The purpose of forming a histogiam oi a fiequency polygon is primarily 
illustrative Both are used to demonstrate, to the investigator or to the 
public, information already contained m the frequency tabulation, rather 
than to secure further information Now we will describe another kmd 
of graphical representation, called an ogive , which is occasionally used for 
purposes of illustration, but which has for its primary purpose the securing 
of additional information about the distribution 
The need for this new kmd of representation arises when the investi- 
gator wishes to present the data m such a way that it will make clear the 
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relative position, of any individual m the group Suppose that an applicant 
for employment tells you that he made a score of 72 m a mathematics 
examination which was given to his high school class just before graduation 
With this information alone, you will have no more than a hazy notion of his 
ability For all you know, the test may have been so easy that only the 
poorest men made scores as low as 72, or it may have been so difficult 
that a score of 72 indicates high ability However, if the candidate gives 
you the further information that 83 per cent of the men m the class made 
lower scores than he did, you now have a definite basis for judgment It 
is the purpose of this section to show how such relative rankings can be 
quickly computed for any frequency tabulation 

In describing the details of constructing an ogive, let us use the follow- 
ing specific problem A civil service examination was given and it was 
announced that the highest 30 per cent of the examinees could expect 
immediate employment and that the next 40 per cent could expect positions 
to open up for them, m the order of their standing, during the next twenty- 
four months The results of the examination are given m Table 2-6-1 


Table 2-6-1 Examination Scores 


Boundaries 

Limits 

/ 

cum / 

% cum / 

69 5 


S 

0 

0 


70-72 

2 \ 



72 5 


/ 

2 

4 


73-75 

5 \ 



75 5 

76-78 

14 

7 

13 

78 5 

79-81 

8 

21 

38 

81 5 

82-84 

15 

29 

53 

84 5 

85-87 

3 

44 

80 

87 5 

88-90 

6 

47 

85 

90 5 

91-93 

o 

53 

96 

93 5 



55 

100 


What is the most effective way of converting each examination score into 
a score showing the applicant's standing within the group? 

First let us define the cumulative frequency (cum f) as the total number 
of variates which are below any given value of x, and the 'percentage cumula- 
tive frequency (% cum F) as the cumulative frequency reduced to a per- 
centage basis by dividing by N. We compute these quantities as follows. 
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(1) List the boundaries, limits, and frequencies, with the boundaries 
staggered with respect to the limits and frequencies as shown in Table 
2 - 6-1 

(2) Write a zero for the cumulative frequency of the lower boundary 
of the first class Find the cum / for each other boundary by adding each 
/ to the preceding cum /, as shown by the arrow's. For example, the cum 
/of 72 5 is 0 + 2 = 2, that at 75 5 is 2 + 5 = 7, that at 78 5 is 7 + 14 = 
21, and so forth Notice that the values of cum / are written between the 
classes, opposite the boundaries. 

(3) The last number in the cum / column is N, the total number of 
variates (55 m the example) Divide each entry in the cum / column by 
N m order to obtain the percentage cumulative frequencies These may 
be expressed as percentages, as shown m the example, or they may be 
left m the form of fractions oi decimals 

(4) Plot an ogive or a percentage ogive of the data An ogive is a graph 
of the cum / values plotted against the corresponding boundaries, and a 
percentage ogive is a similar graph in which percentage cumulative frequen- 
cies are used Examples of each are shown in Figures 2-6-1 and 2-6-2 
Notice particularly that each cum / or % cum / is plotted against a 
boundary, and not against a class mark 

Let us illustrate the use of an ogive by determining the standing of an 
examinee who made a score of 78 We enter the graph at 78 on the hori- 
zontal scale, go vertically up to the graphed line, and then horizontally 
to the cum / scale, as indicated by the arrows m Figure 2-6-1 We find 
that the value of cum / corresponding to 78 is 18, and this tells us that of 
the fifty-five exammees, only eighteen had scores low r er than 78 If w^e 
perform the same operation on the percentage ogive (Figure 2-6-2), we 
obtain the same information in a more useful form* 34 per cent of the 
examinees made lower scores than 78 The exammee who made 78 is 
therefore not quite out of the running, but wall have to wait almost twenty- 
four months for an opening 

The information described above can be conveyed m a very succinct 
way by expressing the examinee’s standing m “percentile” form The 
percentile rating of an individual wnthin a group is the percentage of the 
membership of a group who have a low r er standing than his The examinee 
who made a score of 78 is at the thirty-fourth percentile , or, to state it more 
briefly, 

78 = P 34 

A percentage ogive is also useful for the inverse problem of finding the 
score corresponding to a given percentile If, for example, it is found that 
there are supervisory positions open for the top 25 per cent of the examinees, 
then the qualifying score for such a position can be read immediately from 
the ogive by entering it at 75 per cent on the vertical scale and reading the 
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Figure 2-6-2. A Percentage Ogive. 
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corresponding examination score, 84, as shown by the arrows m Figure 
2-6-2. In addition to percentiles there are several other standard measure- 
ments of relative standing, as follows: 

Quartile: The first quariile (Q Y ) is the value of x such that one-fourth 
of the variates lie below it , it is in other words the twenty-fifth percentile 
Similarly, Q z is the seventy-fifth percentile 

Deciles, octiles, and so forth, are defined similarly. For instance, the 
third decile is the value of x such that three-tenths of the variates lie below 
it, and the seventh octile is the value of x such that seven-eighths lie below it. 

Median The median ( M ) is the fiftieth percentile From the arrows in 
Figure 2-6-2 we see that for this data, M is 81 2. 



Score 

Figure 2-6-3 A Smoothed Percentage Ogive. 

If the ogive is to be used for estimating the percentile ratings of future 
examinees, then it may be well to draw a smoothed curve through the 
points, making the best possible compromise with all of them, but not 
necessarily going through them, as shown m Figure 2-6-3 In this way 
we avoid the local irregularities and retain only the general shape of the 
ogive. In doing this we assume that the small irregularities are character- 
istic only of the particular sample of fifty-five examinees and cannot be 
expected to recur m future samples In usmg a smoothed ogive m this 
way, we are again estimating the properties of the universe from which 
the sample was taken More exact methods of making such estimates will 
be discussed in later chapters 




32 


INTRODUCTION TO THE THEORY OF STATISTICS 


[CH. 2 


It should be noted that percentile ratings can be used either to give 
information about the standing of an individual within the distribution, 
or to give information about the distribution itself We learn something 
about the distribution of incomes m a given community if we are told that 
the fiftieth percentile, 01 median, is $3300, we learn more about the distri- 
bution if we find also that the first quartile is $3100 and the third quartile 
is $3900 The distribution could be described completely by listing all 
the percentiles 

PROBLEMS 

1 Draw a smoothed percentage ogive for the data m Table 1-4-4, second 
column, using classes 45 to 49, 50 to 54, and so forth What is the median of this 
distribution? What is the first quartile? The ninth decile? The seventy-third 
percentile? 

2 Draw a smoothed percentage ogive of the grades m the third column of the 
same table What is the percentile ratmg m language of the fifth man on the list? 
How does this compare with his percentile rating m mathematics? 

3 Draw a sketch showing the general shape which a histogram must have if 
there is only a small mterval between the first quartile and the median, and a 
much larger interval between the median and the third quartile 

4 Draw a smoothed percentage ogive of the ages of dementia praecox admis- 
sions in Table 2-5-1 Retain this ogive for future use 


Table 2-7-1 The Lorenz Curve 







% 


% 




Income 

Cum 

Cum 

cum 

cum 

Income 

/ 

CM 

of Group 

Income 

Income 

/ 

/ 





0 

0 

0 

0 

$2000- 3000 

3 

2500 

7500 

7500 

2 

3 

6 

3000- 4000 

6 

3500 

21000 

28500 

6 

9 

18 

4000- 5000 

8 

4500 

36000 

64500 

13 

17 

34 

5000- 7000 

15 

6000 

90000 

154500 

32 

32 

64 

7000-10000 

7 

8500 

59500 

214000 

44 

39 

78 

10000-15000 

5 

12500 

62500 

276500 

57 

44 

88 

15000-25000 

3 

20000 

60000 

336500 

70 

47 

94 

25000-45000 

1 

35000 

35000 

371500 

77 

48 

96 

45000-65000 

2 

55000 

110000 

481500 

100 

50 

100 
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7. LORENZ CURVE 

For the special purpose of displaying the degree of inequality of distri- 
bution of income, wealth, or property, a particular sort of graphical 
representation has been devised. It is called a Lorenz curve and it can be 
described most quickly by means of an example 

Table 2-7-1 shows the incomes of a sample of fifty men m a given pro- 
fession To form a Lorenz curve we begin by multiplying the class mark 
of each class by the frequency of that class, which gives us the total income 
of the class (column 4) ; we then form the cumulative values of these total 
mcomes (column 5), and, finally, we reduce these cumulative frequencies 
to percentage values (column 6) by dividing them all by the total mcome, 
which m this case is $481,500 These cumulative values are staggered 
with respect to the original classes, to indicate that they refer to boundaries 
between classes Next we form the cumulative frequences at each boundary 
(column 7); and finally, we reduce these cumulative frequencies to per- 
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centage values (column 8) by dividing them all by the total frequency, 
which in this case is 50 A Lorenz curve of this data, shown m Figure 
2-7-1, consists of a graph of the percentage cumulative income (column 
6) plotted against the percentage cumulative frequency (column 8) 

The readmg of a Lorenz curve is illustrated by the arrows in Figure 2-7-1 
If we wish to know what fi action of the total income goes to the lower paid 
half of the people, we find 50 pei cent on the horizontal scale, find the 
corresponding point on the curve, and carry this over to the left-hand 
scale, as shown by the arrows In this way we find that the lower 50 per 
cent of the earners receive only 23 per cent of the total salary A simple 
variation of this technique will tell us about the upper fractions For 
example, if we wish to know the percentage of the total income which is 
received by the upper 10 per cent of the earners, we enter the curve at 
90 on the horizontal scale and read the result on the vertical scale. Since 
the lower 90 per cent of the earners receive 62 per cent of the total income, 
it follows that the upper 10 per cent receive 38 per cent of the total mcome 
To interpret the general appeal ance of the Lorenz curve, we have only 
to notice that if the total income were divided uniformly among all the 
earners, the cuive would become a straight line Thus the amount of sag 
away from the straight line is a measure of the non-uniformity of dis- 
tribution 


PROBLEMS 

1 The following table shows the distribution of property ownership m a certain 
community 


Property Value 

Number of 

per Owner 

Owners 

$000- $500 

10 

$500- $1000 

12 

$1000- $2000 

4 

$2000- $5000 

7 

$5000-$ 10000 

4 

$10000-$15000 

3 

$15000~$20000 

1 

$20000-130000 

4 

$30000-$40000 

3 

$40000-$50000 

2 


Construct a Lorenz curve of this data 

2. Using the Lorenz curve which you constructed for Problem 1, find the folio w- 
mg the fraction of the total property owned by (a) the upper 10 per cent of the 
owners, (b) the lower 40 per cent of the owners, (c) the lower half of the owners 
(d) Find the fraction of the owners who own half of the total property 
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8. SUMMARY 

The primary objective of the methods described m this chapter is to 
condense long tables of data into summanzed forms which will (a) bring 
out the important characteristics of the distribution and (b) facilitate 
further work on the data This objective is accomplished by sorting the 
data into groups according to size The following terminology is used 

(1) Variate Each of a set of tabulated numbers which form the statis- 
tician’s raw material is called a variate The symbol x is used to mean 
“any one of the variates ” 

(2) Frequency tabulation A table showing the abundance, or frequency 
of occurrence, of variates of various sizes 

(3) Histogram A graph presenting the data contamed m a frequency 
tabulation 

(4) Ogive A graph showing the total number, or the percentage, of 
variates falling below any given value of x 

(5) Class Each group of vanates m a frequency tabulation 

(6) Upper limit and lower limit (UL and LL) The largest and the 
smallest values of x which are to be included in a given class 

(7) Upper boundary and lower boundary (UB and LB). The dividing 
points between classes The boundary between any two classes can be 
found by adding the upper limit of the preceding class to the lower limit 
of the following class and dividing by two 

(8) Class mark (CM) The midpoint of any class It can be found by 
adding the upper limit of the class to the lower limit of the class and 
dividing by two 

(9) Class Interval (C). The width of each class It can be found by 
subtracting the lower boundary of any class from the upper boundary of 
the class Do not subtract the lower limit from the upper limit 

(10) Frequency (/) The number of variates m any class 

(11) Cumulative frequency (cum f) The total number of vanates below 
any given value of x 

(12) Percentage cumulative frequency (% cum /). The percentage of 
variates which he below any given value of x 

The procedure for forming a frequency tabulation is as follows 

(1) Choose an appropriate class interval The interval is generally 
chosen in such a way as to divide the entire range of values of x into ten 
or fifteen equal intervals, but there is considerable latitude in the choice, 
dependmg upon the purpose of the tabulation A detailed discussion of 
the optimum size will be found at the end of Article 5 

(2) Choose an appropriate set of class limits The limits should be 
chosen m such a way as to facilitate the procedure of tabulation The 
limits 20 to 29, 30 to 39, and so forth, are obviously more convenient than 
the limits 21 to 30, 31 to 40, and so forth, since m the former case it is 
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necessary to look at only the first digit m each number m order to assign 
it to its proper class. 

(3) Go through the list of variates m the raw data and for each variate 
place a check mark m the appropriate class, as shown m Table 2-2-2 

(4) Count the number of check marks m each class and enter the total 
for each class under the / column of the tabulation 

The procedure for constructing a histogram is as follows 

(5) Choose a horizontal scale such that your graph will cover con- 
veniently the range from the smallest to the largest of your variates 
Choose a vertical scale such that your graph will extend from zero to the 
largest single frequency m your table 

(6) Locate on the horizontal scale the points corresponding to the class 
boundaries These points divide the line into a number of segments, each 
of which corresponds to one of the classes 

(7) Upon each such segment, construct a column whose height corre- 
sponds to the frequency of the class which it represents 

The procedure for constructing a percentage ogive is as follows 

(8) Assign a cum / of zero to the lower boundary of the first class Start- 
ing with this cum /, obtain the cum / of each of the other boundaries by 
adding to the preceding cum / the frequency of the intervening class, as 
described m Article 6 

(9) Divide each entry m the cum / column by the value of N (the total 
number of variates), and enter the result m the % cum / column 

(10) Construct a graph m which the horizontal scale is x and the vertical 
scale is % cum / Plot each % cum / against the x of the corresponding 
boundary between classes, as shown m Figure 2-6-2 

(11) Connect the points m this graph by straight lines, if you are inter- 
ested primarily m conclusions concerning the sample alone Draw a 
smooth curve through the points if you are interested primarily m con- 
clusions concerning the universe from which the sample was taken 

The procedure for using a percentage ogive is as follows 

(12) The percentile rating of an individual variate is found by locating 
its value of x on the horizontal scale, following a vertical line from this 
point up to the curve, following a horizontal line from this point on the 
curve over to the % cum / scale, and reading the corresponding value on 
this scale This procedure is illustrated m Figures 2-6-2 and 2-6-3 

(13) The value of x corresponding to a given percentile rating is found 
by the reverse of the procedure just described Locate the percentile 
rating on the % cum / scale, follow a horizontal line from this point over 
to the curve, follow a vertical line from this point down to the x scale, and 
read the value of x corresponding to this point The procedure is demon- 
strated m Figures 2-6-2 and 2-6-3 

(14) Deciles , quartiles, octiles , and the median are special cases of per- 
centiles and can be found by the procedure m (13) of this list Definitions 
of these quantities are found in Article 6 
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If the frequency tabulation describes the number of individuals owning 
various amounts of money, property, or income, and if the objective is to 
illustrate the degree of inequality of distribution of wealth among the 
individuals, then a Lorenz curve should be constructed Detailed direc- 
tions for constructing and using a Lorenz curve are given in Article 7. 
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SOME MATHEMATICAL TOOLS 


1. INTRODUCTION 

Solving problems in statistics is likely to become very laborious if the 
work is done by primitive methods It is recommended that you master 
the use of logarithms and learn to use a slide rule, so that you can reduce 
the labor of computation and free your mind for the mastery of basic 
principles This chapter contains instructions for the use of these and 
other mathematical tools Articles 2 to 8, inclusive, deal with the elemen- 
tary properties and uses of logarithms and slide rules, while the remaining 
articles deal with mathematical symbols of special usefulness m statistics 
If you are already familiar with the uses of logarithms and slide rules , go 
directly to Article 9 


2 . LOGARITHMS 

“A logarithm of a number is the power to which the base must be raised 
in order to produce that number ” This is the formal definition of a 
logarithm, but unless you are well grounded m mathematics this definition 
may be meaningless or vague to you Perhaps a better approach is to 
inspect an actual primitive logarithm table and study its properties In 
the first column of Table 3-2-1 are listed the values of two squared (4), 
two cubed (8), and so on up to 2 7 , or 128 In the second column are listed 
the exponents, or powers, to which 2 was raised m order to give the numbers 
in the first column The numbers m the second column are the logarithms 
of the numbers m the first column , relative to the base 2 For example, the 


Table 3-2-1 Some Logarithms of Base 2 


N 

Log N 

4 

2 

8 

3 

16 

4 

32 

5 

64 

6 

128 

7 
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statement that the logarithm of 32 to the base 2 is 5 means simply that it is 
necessary to raise 2 to the fifth power in order to obtain 32 This statement 
is commonly written in the abbreviated form 

log 2 32 = 5 

where the subscript indicates the base being used If we had chosen the 
number 3 instead of 2 for a base, the first column of Table 3-2-1 would 
have consisted of the numbers 3 2 , 3 s , 3 4 , and so forth The choice of a 
base is governed by convenience, and the number 10 is generally chosen 
because it is also the base of our number system 
The antiloganthm is the mverse of the logarithm * Thus, the statement 
that 32 is the antilog of 5 is equivalent to the statement that 5 is the log 
of 32. 

Now notice that with the aid of Table 3-2-1 we can multiply numbers 
together by adding their logs and finding the antilog of the result For 
instance, if w r e wish to multiply 4 by 8, we can proceed as follows The 
log of 4 is 2, and the log of 8 is 3 The sum of these logs is 5 The antilog 
of 5 is 32, which is the answer to the problem This relationship can be 
stated in the form 

log AB — log A + log B (3-2-1) 

Similarly, an inspection of the table will show T that division of two numbers 
is accomplished by subtraction of their logs 

log A/B = log A — log B (3-2-2) 

Thus to divide 128 by 16, we subtract 4 (the log of 16) from 7 (the log of 
128), and obtain 3 Opposite 3 m the table we find its antilog, 8, which is 
the answer to the problem 

To square a number, or cube it, or raise it to any power, we multiply 
its log by that power: 

log ( A n ) = n log A (3-2-3) 

For example, to find the value of 4 3 , we multiply 2 (the log of 4) by 3 (the 
power to which we wish to raise 4), obtaining 6 The antilog of 6 is 64, 
which is the required answer 


PROBLEMS 

1. Usmg Table 3-2-1, find the values of the following* (a) log 2 128, (b) log 2 8, 

(c) antilog 2 5 

2 Evaluate the following logs and antilogs (a) log 6 25, (b) log 3 27, (c) antilog 3 2, 

(d) antilog 5 3 


*For the sake of brevity, these are most frequently called simply the “log” and the 
“antilog ” 
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3 Complete the following computations with the aid of Table 3-2-1 and the 
laws of logarithms 

(a) 8 2 = (d) 4 3 = 

(b) 4 X 16 = (e) 8 X 16 - 

(c) 32/4 = (f) 128 - 32 « 

3. LAWS OF EXPONENTS 

Table 3-2-1 is of course of little use m its present form, because it con- 
tains the logs of only a few numbers It does not, for example, tell us the 
logs of any numbers between 4 and 8, or of any fractional or decimal 
numbers. To see how the table can be extended to give us the log of any 
number, we must study the properties of exponents a little further 

The basic equations governing operations with exponents are as follows 


A X A V = A x+V 

(3-3-1) 

A*/A v = A x ~ v {A * 0) 

(3-3-2) 

(Ay = A ty 

(3-3-3) 


For example, equation 3-3-1 tells us that 2 2 times 2 3 equals 2 s , equation 
3-3-2 tells us that 2 7 divided by 2 4 equals 2 3 , and equation 3-3-3 tells us that 
the square of 2 3 is 2 6 It will be noticed that these three equations contain 
the same information as those m the preceding article, but that here the 
information is expressed in the language of exponents rather than in the 
language of logs 

These relationships have a simple meaning so long as the exponents are 
integers But, m order for our table of logarithms to be useful, we must 
include cases m which the exponent takes on any value whatever, including 
fractional or negative values The expression 3 4 means 3 multiplied by 
itself four times, or3X3X3X3 But what meaning can we assign to 
3 4 5 , or to 3 -4 , so that the three basic relations of exponents or logs will 
continue to be valid? 

We can obtain an answer to this question by examining the three equar 
tions themselves To assign a meaning to 2 1 , for example, let us put A 
equal to 2, x equal to 4, and y equal to 3 m equation 3-3-2 

2 4 / 2 3 = 2 1 

But 2 4 /2 3 is 16/8, or 2, and therefore 2 to the first power must equal 2 
itself. In general, 

A 1 = A (3-3-4) 

To assign a meaning to 2°, let us put x equal to 4 and y equal to 4 in equa- 
tion 3-3-2: 


2 4 / 2 4 = 2 4-4 = 2 ° 
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Fiom this we see that 2 to the zeio power must equal one In general, 

A 0 =1 (A 0) (3-3-5) 

To assign a meaning to expressions containing negative exponents, we 
can put x equal to 0 and y equal to n (where n means “any number what- 
ever 77 ) in equation 3-3-2 

A°/A n = A°~ n = A~ n 

Since A 0 is 1, this tells us that 

A~ n = 1/ A n (A ^ 0) (3-3-6) 

To assign a meaning to expressions like 2 1/3 , let x equal 1/3 and let y equal 
3 m equation 3-3-3 

( 2 1/3 ) 3 = 2 1 = 2 

In other words, 2 1/3 is a number which, when cubed, is equal to 2 It is 
theiefore the cube root of 2 To generalize this result, let x equal 1/n and 
let y equal n m equation 3-3-3 

(. A 1/n ) n = A 

Now let us take the n th root of both sides of the equation: 

A 1/n = (3-3-7) 

By means of combinations of these equations, we can assign a meaning 
to any power of any number, whether the power is a positive or negative 
number, and whether it is a whole number or a fraction. For example, we 
can evaluate 2~ 3 5 as follows 

2' 3 5 = l/(2 3 5 ) = l/(2 3 X 2 ,/2 ) = 1/(8 a/2) = 0 08838 • 

PROBLEMS 

1 Evaluate the following, using the laws of exponents (a) 3~ 2 , (b) 4 2 5 , (c) 
17°, (d) 9\ (e) 9 -1 5 

2 Evaluate the following logs (a) log 2 1/8, (b) log 9 3, (c) log 4 1/32. 

4. CHOICE OF BASE 10 

From the preceding article we can see that it is possible, although some- 
times laborious, to compute the log of any number to any chosen base 
It would be possible, for example, to extend Table 3-2-1 to include the logs 
of all numbers between 1 and 100, or between 1 and 1000, or any other 
range But however long we made the table, there would be other numbers 
beyond its range, and the usefulness of the table for computing purposes 
would be limited To avoid this limitation we must find a method by 
which we can tabulate the logs of only a few numbers and use these to find 
the logs of other numbers when needed. This can be accomplished m a 
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particularly simple mannei if we choose 10 as the base of the loganthm 
system, as we will show below. For this reason the base 10 is used uni- 
versally whenever the logarithm table is to be used primarily as a com- 
puting aid. Such logs are usually called “ common logs.” 

From the definition of a logarithm m Article 2, we see that 

log 10 N = L if 10 L = N 
Since 10~ = 100, for instance, w T e see that 


Similarly, 

and 

and 


log 100 = 2* 
log 1000 = 3 
log iV = - 1 
log Vlo = 0 5000 


Tables of common logarithms are widely used in many branches of 
applied mathematics The various tables differ chiefly m the number of 
significant figuies letamed In a “six-place” table, for example, the log 
of 7 is given as 0 845098, while m a “five-place” table it is given as 0 84510, 
and m a “foui-place” as 0 8451 A six-place table is accurate to about one 
part in a million, and is used only where high precision is needed Five- 
place tables (which give an accuracy of about one part m 100,000) are 
much more widely used Where high accuracy can be sacrificed for the 
sake of speed, four-place or even three-place tables may be used All such 
tables are used m the same way, and if you master the use of one table you 
can readily adapt your technique to other tables when greater accuracy or 
higher speed is desired In this book we will use a four-place table, which 
is accurate to about one part m 10,000, or to about one part m 5000 if the 
“proportional parts” section of the table is used 
A table of four-place common logs is given m Appendix I This table 
contains the logs of 900 numbers, from 1 00 to 9 99 inclusive, rounded off 
to four decimals To find the log of a number not lying between these 
limits, we express the number as the product of two factors, one of which 
can be found in the table, and the other of which is a power of 10 We 
then find the log of the number by adding the logs of the two factors, 
according to equation 3-2-1 For example, the log of 2000 could be found 
as follows 


log 2000 = log (2 X 1000) = log 2 + log 1000 = 0 3010 + 3 = 3 3010 
Or, to find the log of 0 02, 

log 0 02 - log 2 X rh = log 2 + log t+t = 0 3010 + (-2) 

*It is customary to omit the subscript when 10 is used as the base Following this 
convention, log 100 is understood to mean logio 100 
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The value of this logarithm is 0 3010 minus 2, or — 1 6990 It is conven- 
ient, however, not to combine the positive and negative parts in this 
way, but to keep them separate throughout the computation For conven- 
ience, it is customary to subtract and add enough so that the negative part 
is 10 or a multiple of 10: 

log 0.02 = 8 3010 - 10 

The preceding paragraph explains the principle of finding logs of numbers 
not m the table, but in practice it is more convenient to operate m accord- 
ance with a set of rules which embody the necessary principles Such a 
set of rules is given below m detail 

TO FIND THE LOG OF A NUMBER 

1. Insert an arrow after the first non-zero digit in the number. This 
arrow marks what we will call the “standard position ” Round the number 
off, if necessary, so that it contams only three digits after the standard 
position For example, if we need the log of 342 68, as given by four-place 
log tables, we write 

log 342.7 = 

T 

2 Count the number of digits between the standard position and the 
decimal point Record this number, leaving space for a decimal to follow: 

log 342.7 = 2. 

T 

If the decimal point happens to be exactly at the standard position, this 
whole number is zero : 

log 3.427 = 0. 

If the decimal pomt is to the left of the standard position, this whole number 
is negative For example, m the number 0 003427 the decimal point is 
three spaces to the left of the standard position, and the whole number is 
therefore minus three, which we write as follows : 

log 0 003427 - 7. -10 

3 Read the four-digit number beginning with the digit preceding the 
standard position (in our example, 3427). Find the first two of these (34) 
m the left-hand column under N in the log table (see a m Table 3-4-1), 
find the next digit (2) m the column headings under “Logarithms” (see b); 
and find the last digit (7) m the column headings under “Proportional 
Parts” (see c) 

4. Step 3 identifies two columns and one row Locate the two numbers 
which are m this row and m these two columns Add these two numbers 
and write the result in the space reserved for the decimal part of the log. 
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Table 3-4-1 Using a Table of Logarithms 


Logarithms Proportional Parts 

(b) (c) 


N 

0 1 

2 

3 — 

— 6 

■/ 

7 

8 — 

33 



(d) 

/ 



(e) 

/ 

(a) — >34 

5315 5328 

5340 

5353 — — 

— 8 

9 

10 — 

35 — — 

— 


— 

— — 


In our example the two numbers are 5340 (see d) and 9 (see e) , the sum of 
these is 5349 The final result then is 

log 342 7 = 2 5349 

For the other examples m item two, the completed logs are 

log 3 427 = 0 5349 

and 

log 0 003427 = 7 5349 - 10 

TO FIND THE ANTILOG OF A NUMBER 

1 Read the decimal part of the logarithm and find the next smaller 
number m the body of the table For example, if we wish to find the anti- 
log of 6 5338 — 10, we find the number m the table which is just smaller 
than 5338, this “next smaller number” is 5328 (see Table 3-4-1) 

2 Compute the difference between these two numbers, and find this 
difference (or the nearest number to it) which is m the same row and under 
“Proportional Parts ” In the example, the difference is 5338 minus 5328, 
or 10, which we find m “Proportional Parts” opposite the row m which 
we are workmg. 

3 The first two digits of the required antilog are found under N at the 
left-hand side of the row, the third digit is found under “Logarithms” at 
the top of the column containing the “next smaller number”; the fourth 
digit is found under “Proportional Parts” at the top of the column con- 
taining the difference. In the example, the first two digits are 34, which we 
find to the left of 5328, the third digit is 1 , which we find above 5328, and the 
fourth digit is 8, which we find above 10 m the proportional parts section. 
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4 Place an arrow after the first of these digits; this marks the standard 
position Read the whole number in the original logarithm, and locate 
the decimal point this many places to the right of the standard position 
(If the whole number is negative , then locate the decimal pomt this many 
places to the left of the standard position ) In the example, the standard 
position is between the 3 and the 4, and the whole number in the logarithm 
is 6 minus 10, or minus 4 We therefore locate the decimal pomt four places 
to the left of the standard position Our final result is 

antilog 6 5338 - 10 = 0 0003418 

T 

In lookmg up logs or antilogs, you may find that your required number 
falls exactly halfway between two digits In such cases it is customary, 
for the sake of uniformity, always to choose the even digit For example, 
the antilog of 0 3307 is exactly halfway between 2 141 and 2 142, and we 
write 

antilog 0 3307 = 2 142 

With a little practice all these operations can be carried out rapidly and 
accurately, without w T ntmg down any of the intermediate numbers. Skill 
m the use of logarithms can be acquired only by practice, and it is recom- 
mended that you drill on these operations until you can carry them out 
rapidly and without hesitation The time which you spend on drill in 
this way will be saved many times over in increased speed in later work 

You may prefer to interpolate directly m Appendix I, without using 
the proportional parts section If you do so, your results will be accurate 
to within one part in 10,000 If you use the proportional parts table your 
results will be accurate to within one part m 5000. 

PROBLEMS 

1 Find the logs of the following numbers 

(a) 3497 (e) 3 841 

(b) 296,400 (f) 146,300,000 

(c) 0 000284 (g) 2 973 

(d) 0 000,009,772 (h) 0 8160 

2. Fmd the antilogs of the following numbers 

(a) 3 5872 (e) 6 2887 

(b) 5 1775 - 10 (f) 0 9824 

(c) 0 4087 (g) 9 4679 - 10 

(d) 8 9739 - 10 (h) 7 5072 - 10 

5. USES OF LOGARITHMS 

The basic operations for which logarithms are useful are those of multi- 
plication, division, and the finding of powders and roots of numbers, or any 
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combination of these operations The details of such computations can be 
learned most readily through the study of examples 


L Multiplication 

A press m a factory stamps out an average of 1463 metal parts per 
working hour How many will it turn out m a year of 49 operating weeks, 
if the factory operates 42 5 hours per week? 

To solve this problem, we must multiply 1463 times 49 times 42 5, and 
equation 3-2-1 tells us that we must add the logs of these three numbers, 
then look up the antilog of the result We begm by arranging a frame- 
work, or plan, for the computation 


log 1463 = 

log 49 — 

log 42 5 = 

log answer = 

Answer = 


(+) 


We next look up the logarithms of the three numbers, insert them in the 
framework of the computation, add them, and then look up the antilog of 
the result The completed computation looks as follows: 

log 1463 = 3 1653 

log 49 = 1 6902 

log 42 5 = 1 6284 ^ 

log answer = 6 4839 
Answer = 3,047,000 

This number is accurate only to four significant figures, since we used four- 
place logs The first three digits can be relied upon absolutely, and the 
fourth may contain a small error due to the accumulation of “rounding 
off” errors In other words, the output per year will probably be between 
3,046,000 and 3,048,000 If the stamping machine turned out exactly 1463 
parts per houi , and the factory always operated exactly 42 5 hours per 
week for exactly foity-nine weeks, then it might be worth usmg five-place 
logs to obtain a more accurate answer, but it is more likely that the original 
numbers can be relied upon to only four significant figures or less, and in 
this case it would be wasteful and misleading to carry out the computation 
with higher accuracy 

II. Division 

The distance from the earth to the sun is 92,900,000 miles. Light travels 
at the rate of 186,000 miles per second How long does it require sunlight 
to reach us after it leaves the sun? 

Here we must divide 92,900,000 by 186,000. Equation 3-2-2 tells us 
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that we must subtract the log of 186,000 from the log of 92,900,000, and 
look up the antilog of the result The computation is as follows: 
log 92,900,000 = 7 9680 
log 186,000 = 5 2695 ^ 

log answer = 2 6985 
Answer = 499 4 

Thus the time required is 499 seconds, or a little over 8 minutes. 

III. Powers 

A meteorological sounding balloon is to be 8 23 feet m radius. How 
many cubic feet of gas will be lequired to fill it? 

The formula for the volume of a sphere is 4/3 tt 3 , where r is the radius 
of the sphere and t is 3 1416 . To obtain the log of r 3 , we must find 

the log of r and multiply it by 3, according to equation 3-2-3. The complete 
computation is as follows. 


log 8.23 

= 0 9154 


-(x) 

log 8 23 3 

= 2 7462 

log 3 142 

= 0 4972 

log 4 

= 0 6021 ^ 


3 8455 

log 3 

= 0 4771 ^ ^ 

log answer 

= 3 3684 

Answer 

= 2336 


The required volume, rounded to three figures, is 2340 cubic feet. 

IV. Roots 

A sunken storage tank is to be constructed in the form of a cube large 
enough to hold 3500 cubic feet of liquid What should the dimensions of 
the tank be? 

Since the volume of a cube is obtained by cubmg the length of one side, 
we must find the cube root of 3500. If we rewrite equation 3-3-7 in the 
language of logs, we have 

log VA = - log A (3-5-1) 

Th 

Thus, to find the cube root of 3500, we must find the log of 3500, divide 
it by 3, and look up the antilog of the result. The computation is as 
follows 

log 3500 = 3 5441 

3 [3 5441 


log answer = 1 1814 

Answer = 15 19 ft 
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V. Operations with Negative Logarithms 

All the operations described above proceed m much the same way when 
the whole number m the logarithm is negative The details are illustrated 
by the following examples 

(a) Multiply 0 008917 times 38 41 times 0 0868 The computation is 
as follows: 

log 0 008917 = 7 9502 
log 38 41 - 1.5844 

log 0 0868 = 8 9385 

log answer = 18 4731 
Answer = 0 02972 

The whole number m the final log is 18 minus 20, or minus 2, and we insert 
the decimal point two spaces to the left of the standard position 

(b) Divide 0 00287 by 0 746 The computation is as follows: 

log 0 00287 = 17 4579 - 20 
log 0 746 = 9 8727 - 10 

log answer = 7 5852 — 10 
Answer = 0 003848 

The first log would normally be written 7 4579 — 10, but we add and sub- 
tract 10 m order to yield a positive number when we subtract the second 
log In the second column, —20 — (—10) = — 20 + 10 = —10 

(c) Find the cube of 0 8745 The computation is as follows 

log 0.8745 = 9 9418 

log answer = 29 8254 
Answer = 0 6690 

The whole number is 29 minus 30, or — 1 

(d) Find the cube root of 0 0004692 The computations are as follows* 

log 0 0004692 = 6 6714 - 

3[ 26 6714 - 

log answer = 8 8905 - 

Answer = 0 07772 

The first log would normally be written 6 6714 - 10, but since we must 
divide by 3 we add and subtract enough so that the negative part will 
equal minus 10 after the division is performed 


- 10 

- 10 
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PROBLEMS 

Perform the following computations by means of the log table in Appendix I. 
For comparison, the times required by a moderately skilled computer for some of 
the problems are shown at the right 

1 384 X 19600 

2 0 004173 X 41 38 (Time 55 seconds) 

3 8468 - 617 

4 0 003282 — 0 000176 (Time 60 seconds) 

5 0 00624 0.08718 

6 (1 572) 6 (Time 35 seconds) 

7 V 40 91 

8 A/ ^32150 (Time 45 seconds) 

9 a ^O 0005281 

_ 3 /SO 40 X 91 6 X 0 00264 /nn _ 

10 ^ Z ~ 2 1~X (0~ 0793) 2 Time 2 mmutes 55 seconds) 

6. POPULATION AND INTEREST PROBLEMS 

Logarithms aie paiticularly useful m problems involving any quantity 
which varies at a rate which is proportional to the quantity itself For 
example, if any colony of living organisms is provided with adequate food 
and space, the number of new organisms per unit of time will be pro- 
portional to the number of organisms already present In particular, if a 
bacteriological culture contains one million bacteria, and is growing at the 
rate of 50,000 bacteria per hour, then it is reasonable to believe that m 
another culture containing three million bacteria, the rate of increase will 
be 150,000 per hour, smce each million of the three millions will produce an 
increase of 50,000 We can describe this by stating that the hourly rate of 
increase is 5 per cent of the population 

Now let us suppose that a bactenologist has estimated the population 
of such a culture, on the basis of sample counts, to be 850,000, and that he 
wishes to estimate the number which will be present after 40 hours if the 
bacteria contmue to increase m the same way 

We might obtam a rough idea as follows. The population increases 5 
per cent per hour, therefore m 40 hours it will increase 40 X 5 per cent or 
200 per cent The increase during 40 houis would theiefore be 1,700,000, 
making a total of 2,550,000 at the end of 40 hours This argument is 
however maccurate, because it assumes that the rate of increase is always 
42,500 per hour, whereas in fact the rate of increase will change as the 
population grows A moie accurate procedure would be to use the fact 
that at the end of any given hour the population is 105 per cent of what 
it was at the beginning of the hour, that is, that the hourly ratio of in- 
crease is 105 per cent or 1 05 We could then proceed as follows 
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Original population 850,000 

XI 05 

Population after 1 hour 892,500 

XI 05 

Population after 2 hours 937, 125 

and by continuing this through thirty-eight more steps we could find 
accurately the population to be expected after 40 hours 
This procedure can be greatly accelerated by the use of logs If we 
start with the log of 850,000 and add the log of 1 05, we will have the log 
of the population after one hour, if we again add the log of 1 05, we will 
have the log of the population after two hours, and so forth To solve 
our pioblem we must add the log of 1 05 forty times, which of course we 
can do m a single step 


log 1.05 

= 0 0212 


X 

o 

40 X log 1 05 

= 0 8480 

log 850,000 

= 5 9294 (+) 

log answer 

= 6 7774 

Answer 

= 5,990,000 


We can formulate this procedure verbally as follows* The log of the 
population at any time is the log of the original population, plus the log of 
the ratio of increase 'per interval times the number of intervals which have 
elapsed Or, if we let P Q stand for the original population, R the ratio of 
increase in a given time interval, n the number of such intervals which 
have elapsed, and P n the population at the end of this time, then the 
equation for P n is 


log P n = log P 0 +n log R 

Any quantity which increases according to this equation is said to increase 
geometrically The use of the formula is illustrated by the following 
examples 

(1) A city had a population of 58,900 in 1940 and a population of 61,700 
m 1950. The owners of a department store plan to erect a new building 
in this city, which must be used for at least twenty years m order to repay 
its cost. In planning the size of the building, they wish to allow for ex- 
pansion of the population of the city in future years What population 
is to be expected by 1970 ? 

To solve this problem, we must first find the annual ratio of increase, 
or its log The following computation is self-explanatory* 
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log 61,700 
log 58,900 

Increase in log in 10 years 
Increase in log per year 


Increase in log in 20 years 
log of P in 1950 

log of P in 1970 
Population in 1970 

The best estimate for the 1970 population, made from the data given here, 
is thus seen to be 67,700, rounded off to three significant figures. 

(2) The conditions for geometric increase are fulfilled exactly in the 
increase of a sum of money lent at compound interest. We can use equa- 
tion 3-6-1 directly for this problem if we let n be the number of interest 
periods, R the ratio of increase of the principal per interest period, and 
P 0 and P n the original and final principal For example let us consider 
the following problem 

A sum of $2300 is deposited in a bank, and draws interest at 5 per cent, 
compounded semiannually What will the principal be after twelve 
years? 

Smce the interest is compounded semiannually, there will be twenty- 
four interest periods during the twelve-year interval; therefore n is 24 
Five per cent interest per year is equivalent to 23^ per cent in six months, 
therefore the ratio of increase during each interest period is 1.025. The 
computations are as follows * 


log R 

= 0 010724 



24 

(X) 

n log R 

= 0 257376 


log Pc 

= 3 3617 

(+) 

log P n 

= 3 6191 


Pn 

= $4160 



= 4.7903 
= 4 7701 


(-) 


= 0 0202 
= 0 00202 
20 


(X) 


= 0.0404 
= 4 7903 

= 4.8307 
= 67,720 


(+) 


3 A man wishes to purchase an endowment policy to educate his 
children If he wishes to receive $10,000 from the policy fifteen years 
from now, how much should he pay, now, for the policy m full if his money 
will draw interest at 4% compounded annually? 

This problem differs from the previous one only in that we know P n 


*This computation requires logarithms of more than four-place accuracy for R, smce 
this log is to be multiplied by so large a number We therefore use Appendix II, which 
contains a few six-place logs for interest problems 
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and wish to obtain P 0 If we solve equation 3-6-1 for P 0 we obtain 
log P 0 = log P n — n log R 

We insert P n = $10,000, n = 15, and R = 1 04; we find that log P 0 is 
3 7445 and P 0 is $5552 

PROBLEMS 

1. If the interest rate is 3 per cent, compounded annually, what should be the 
present cost of an endowment policy if the beneficiary is to receive $10,000 at 
the end of twenty years? 

2 In the preceding problem, how much should the beneficiary receive if he 
elects to cash the policy at the end of only fifteen years? 

3 A Roman com, estimated to be 2300 years old, was recently found near 
Naples It is estimated to have been worth about 5£. How much would it have 
been worth now if it had been deposited in a bank during the last 2300 years, 
drawing interest at 3 per cent compounded annually? 

7. SLIDE RULE 

A shde rule is a device for adding or subtracting logs mechanically It 
has the great advantage of high speed, but it is limited in accuracy to 
about one part m a thousand Since most of the computations to be made 
m statistical problems require no more accuracy than this, a slide rule is 
a very helpful tool for the statistician It is recommended that you pur- 
chase a slide rule* and master its elementary uses before proceeding to 
the next chapter 

The principle of a slide rule can be grasped most readily as follows. 
Let us suppose that we wish to multiply two numbers (say 2 and 3) to- 
gether by means of logarithms We must add the log of 2 (which is 0 301) 
to the log of 3 (which is 0 477) and find the antilog of the sum We could 
perform this addition mechanically by laying off 0 301 unit on a ruler, 
then laying off 0 477 unit on another ruler, and then laying the two lengths 
end to end and measuring the total length 

If the rulers used m the above process were to be used only for adding 
logs, we could save time by printing “2” at a point 0 301 unit from the 
end of each rule, “3” at a point 0 477 unit from the end, and so forth 
In this way we would avoid the necessity for looking up the logs which are 
to be added. A slide rule actually consists of two rules printed m this 
way, as shown m Figure 3-7-1. It will be noticed that the divisions on 
the slide rule become closer and closer together as we go to the right; this 
is a consequence of the fact that the logs increase more and more slowly 
as we go to larger numbers 

*A satisfactory rule can be purchased at any price from 35^ to $15 00 The Keuffel 
and Esser Student’s Slide Rule at about $3 50 is recommended for all-round use 
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The use of such a rule for multiplication is shown in Figure 3-7-1. If 
we place the left end of the upper rule opposite “ 2 ” on the lower rule 
(see a), then find “3” on the upper rule (see b), then read off the number 



Figure 3-7-1. The Slide Rule. 


opposite this on the lower rule (see c), the result must be the number 
whose log is log 2 plus log 3, it must in other words be the product of 2 
times 3 

In practice the difficulty of using a slide rule lies chiefly in the reading 
of the scales. The following example will be useful in mastering the 
technique. 

Illustrative Problem Multiply 1 28 by 2 24. The successive steps (illustrated 
m Figure 3-7-2) are as follows 

1 Locate 1 28 on the D scale We find by inspecting the rule that the space 
between the numbers 1 and 2 is separated into ten major divisions, labeled “1,” 
“ 2 ” and so forth. These therefore must stand for 1.1, 12, 13, and so forth, up 








£ 1 l 2 3 4 5 6 7 8 9 I 

ImitmJmdm JnaliHtLniliiiiliuftmilftMlHiiirtiliiiiLnimi milmilmlm 

l/utim. | 


f— |rrrr|TTTT|TTTTp 

D i 1 

li 1 1 |IIU|M I ijii Iijii ti|l<ti|li(i|itti|uu|i(ii|nu|uil|iiii|itli|ihi|i]n|liii| / 1 1 ijiiil jilll iHijHiljli 

2 /3 4 567890 

jJx : 

iljlir ijlilijn 

; 

0 ^ 


7 J ^ 

1 20 1 28 1 30 


_® 


Figure 3-7-2. Multiplication with a Slide Rule. 


to 1 9 The number 1 28 is therefore between the small 2 and the small 3 We 
see that the space between the small 2 and the small 3 is divided into ten sub- 
divisions, these must represent 1 21, 1 22, 1 23, and so forth up to 1 29 Our 
number is therefore on the eighth dividing lme m this interval 

2 Slide scale C along until 1 on the C scale is opposite 1 28 on the D scale 

3 Fmd 2 24 on the C scale and place the runner, or movable vertical marker, on 
it Upon inspection of scale C, we see that the interval between 2 and 3 is divided 
into ten major subdivisions, which must represent 2 1, 2 2, 2 3, and so forth The 
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interval between 2 2 and 2 3 is divided into five subdivisions, and the fine dividing 
lines must therefore represent 2 22, 2 24, 2 26, and 2 28 We place the runner 
upon the second of these dividing lines, as shown m Figure 3-7-2 

4 Read the number on scale D which is now underneath the runner We find 
by an inspection of the subdivisions that the runner lies m the interval between 
a dividing line representmg 2 86 and one representing 2 88, and a careful inspec- 
tion will show that it lies approximately one-fourth of the way along this interval 
The entire interval represents 0 02, and one-fourth of this is 0 005, so that the 
number represented by the position of the runner is 2 865, which is the required 
answer, with some uncertainty m the final digit 
The computation described above takes far longer to describe than to perform 
With a little practice you should be able to perform such a computation m full 
within fifteen or twenty seconds The only pitfall m mastering this skill by your- 
self will lie m misinterpretations of the meamngs of the subdivisions on the scales, 
and this pitfall can be avoided by careful inspection of the scales during your 
learning period 


PROBLEMS 

Perform the following multiplications with a slide rule 

L 3 81 X 1 92 4 2 792 X 3.28 

2 1 745 X 4 85 5 1 147 X 1 269 

3 1 019 X 9 17 6 1 26 X 2 965 

8 OTHER SLIDE RULE OPERATIONS 

In the preceding section the use of a slide rule was illustrated only for 
numbers lying between 1 and 10 For other numbers, a slight modifica- 
tion of the procedure is necessary 

Example 1 Multiply 1972 by 316 Here we ignore the decimals, and multiply 
1.972 by 3 16 m the usual way, obtaining 6 23 Next we round off both numbers 
and mentally estimate their product, as follows The first number is approxi- 
mately 2000, and the second is approximately 300, so that the answer should be 
approximately 600,000 With this information we see that the answer must be 
623,000 

Example 2 Multiply 0 000278 by 1,392,000 Again we ignore the decimals 
and multiplying 2 78 by 1 392, obtaining 3 87 Here, however, the estimation of 
the answer is more difficult, and it is better to rewrite the original numbers m the 
folio- wing forms 1,392,000 is equal to 1 392 times one million, which is ten raised 
to the 6th power 

1,392,000 = 1 392 X 10 6 
Similarly, 0 000278 = 2 78 X 10" 4 

These exponents can be found quickly by counting the number of spaces that the 
decimal is removed from the “standard position” referred to m Article 4 Now to 
complete our problem it is only necessary to multiply 10 6 by 10 -4 , and this is 
equal (by 3-3-1) to 10 6 " 4 , or 10 3 . Our answer is therefore 3 87 times 10 2 , or 387 
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It is obvious that the operation of division can be performed by means 
of a simple modification of the procedure described above For con- 
venience, a set of specific rules for this and other operations are given 
m Table 3-8-1 


Table 3-8-1 Use of the Slide Rule 


Multiplication Place 1 on C opposite first factor on D, find second factor 
on C and set runner on it, read answer under D (If the second factor lies beyond 
the end of the D scale, start over, using the 1 at the right end of C instead of the 1 
at the left end ) 

Division Find numerator on D and place runner on it Slide C until denomi- 
nator on C is under runner Read answer on D opposite 1 on C 

Squaring Find number on D and place runner on it; square of number is 
found on A under runner 

Square Roots. Fmd number on A and place runner on it; square root is 
found on D under runner (To decide whether to use the left or the nght half 
of A, first make a mental estimate of about how large a number is to be expected ) 


These operations are illustrated by the following examples 

1 Multiply 317 by 38 2 We place 1 on the C scale opposite 3 17 on the D 
scale, and then locate 3 82 on the C scale We find that this number falls beyond 
the right-hand end of the C scale, so that the answer cannot be found in this way 
We therefore begm again, placing the 1 which is at the extreme nght - hand end 
of the C scale opposite 3 17 on the D scale From here we proceed as usual, finding 

3 82 on the C scale and reading the answer opposite it on the D scale The re- 
quired answer is 12,110 

2. Divide 0 00416 by 0 0000237 We find 4 16 on the D scale and place the 
runner on it We then slide the C scale until 2 37 is under the runner We read the 
number on the D scale which is opposite 1 on the C scale, this number is 1 755. 
To locate the decimal, we rewrite the problem m powers of 10 We wish to find 

4 16 times 10" 3 , divided by 2 37 times 10~ 5 Since 10~ 3 divided by 10" 5 is io _3_( “ 5) 
or 10 2 , the answer is 1 755 times 10 2 or 175 5 

3 Fmd the square of 0.00796 We find 7.96 on D and place the runner upon it 
Under the runner on scale A we find the number 634 To locate the decimal, we 
write 0 00796 m the form 7 96 times 10~ 3 The square of 10~ 3 is 10~ 8 (equation 
3-3-3), and the square of 7 96 is 63 4 Our result is therefore 63 4 times 10 -6 , or 
0 0000634 

4 Fmd the square root of 4 95 Here we find 4 95 on the left half of the A scale 
(the corresponding point on the right half indicates 49 5), place the runner upon 
it, and read the result under the runner on the D scale, it is 2 225. 

5 Fmd the square root of 42,900,000 Here we can rewrite the number either 
as 4 29 times 10 7 or as 42 9 times 10 6 The second form is better for our purposes 
smce it is easier to take the square root of 10 6 than of 10 7 . We therefore find the 
square root of 42 9 (usmg the right half of scale A). This square root is 6.55, and 
the square root of 10 6 is 10 3 Our answer is therefore 6 55 times 10 3 , or 6550 
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PROBLEMS 

Perform the following operations with a slide rule 


1 

417 X 3 92 

8 

Vl84 

2 

5680 X 0 00291 

9 

Vl840 

3 

847 - 15 28 

10 

y/ 278,000 

4. 

0 000642 - 17920 

11 

\/0 00278 




352 X 19 74 

5. 

58,900 -s- 0 00895 

12. 

11 48 




1 5 93 X 64 2 

6 

7 97 2 

13 

\ 247 X 0 0985 

7. 

(0 000588) 2 




9. SUMMATION SYMBOL 

A very useful symbol m statistics is the Greek capital letter sigma 
(2), which means “the sum of the following variates ” For example, 
So, means “the sum of all the %’s,” and Exy means “the sum of all the 
products of x times y” For the data given m Table 3-9-1, for example, 


Table 3-9-1 Data for Five Children 



1 

2 

3 

4 

5 

6 

7 

8 

9 






Height Height 

Grade Height 

Age 


Height 

Age 

IQ 

Grade 

Plus 

Minus 

Times 

Times 

Squared 






Age 

Age 

Height 

Age 



X 

y 

z 

C 

x+ y 

x — y 

4x 

xy 

V 


53 

10 

80 

4 

63 

43 

212 

530 

100 


63 

li 

115 

4 

74 

52 

252 

693 

121 


51 

9 

95 

4 

60 

42 

204 

459 

81 


50 

8 

120 

4 

58 

42 

200 

400 

64 


58 

12 

75 

4 

70 

46 

232 

696 

144 

s 

275 

50 

485 

20 

325 

225 

1100 

2778 

510 

2/N 

55 

10 

97 

4 

65 

45 

220 

555 

6 102 


Ex means the sum of the heights of the five children, or 275, while Ey 
means the sum of their ages, or 50 In the same way, we see that E(x + y) 
is the sum of the terms m the fifth column, or 325 From the remaining 
columns, we see that E4x is 1100, Exy is 2778, and 'Ey 2 is 510 
In the derivations of formulas, it is frequently desirable to rewrite these 
composite sums m simpler forms If, for example, we require the value 
of 2(x + y), and know already the values of 2# and Ey, it would be 
advantageous to be able to compute the former sum directly from the 
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latter two, without computing the separate values of x + y. To see that 
this is possible, let us write out the meaning of these sums in equation 
form. The symbol 2# means “the first value of x plus the second value 
of x plus . . . and so forth” 

2x = xi + x 2 4* x 3 + * • • + x N (3-9-1) 

where N is the number of variates Similarly, 

= y 1 + y 2 + 2/3 + * * * +- yN 

and 2(x + y) = (x l + yi) + (x 2 + y 2 ) -f • * - + (x N + y N ) 

If we rearrange the terms on the right-hand side of the above equation, 
we have 

2 (x + 2/) = ^ + x 2 + x z + * * * + x h ) + ( 2/1 + 2/2 + * * * + vn) 

or, replacing the two parts of the right-hand side by their equivalents 
m summation notation, 

2(a? + y) = 2a: + 'Ey (3-9-2) 

The validity of this equation can be observed experimentally by noting 
that m Table 3-9-1 the sum of the x + y column (325) is equal to the sum 
of the x column (275) plus the sum of the y column (50). 

A similar equation, the proof of which will be left to the student, is 

2(3 - y) = 2z - 22 / (3-9-3) 

This equation can be experimentally verified by noting that the sum of 
column 6 m Table 3-9-1 is equal to the sum of column 1 minus the sum of 
column 2 

The next of this seiies of equations is 

2 Gx = CXx (3-9-4) 

where C stands for any number which does not change as we go from one 
entry to the next in our table Such a number is called a constant An 
example is shown m the fourth column of Table 3-9-1, which gives the 
grade in which each child is enrolled Since all are in the fourth grade, 
this value is a constant We can verify equation 3-9-4 by observmg that 
the sum of column 7 is equal to four times the sum of column 1. 

The last equation of this series is 

2 C = NC (3-9-5) 

For example, we see that the sum of column 4 is 5 (the number of vanates) 
times 4 (the constant which we are summing) 

Upon a first reading of the four equations above, the student is likely 
to feel that they r contain only trivial information which is obviously true 
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As a warning against too uncritical an acceptance of them, let us point out 
that we cannot apply the same procedure to the sum of a set of products 
Column 8 of Table 3-9-1 contains the products of x times y , and the sum 
of these products is 2778. If we add the x and y columns first and then 
multiply the results together we have 275 times 50, or 13,750 In other 
words, it is not true that Exy = (Ex) X (Ey) 

If the var ates are grouped mto a frequency tabulation, then it is neces- 
sary in finding Ex to multiply each value of x by the number of times 
w T hich it occurs, that is, by its frequency For example, in Table 3-9-2, 


Table 3-9-2 Summation of a Frequency Table 


X 

f 

fx 

4 

3 

12 

7 

4 

28 

10 

9 

90 

13 

8 

104 

16 

1 

16 

E 

25 

250 

E/N 


10 


we obtain Ex by first multiplying each entry m the x column by the 
corresponding number m the / column, entering the result m the fz column, 
and adding the results We see that Ex for this table is 250 This con- 
vention will be used throughout the book, and you should remember that 
the summation symbol , when applied to a frequency tabulation , implies that 
each variate is to be multiplied by its frequency before adding . 

PROBLEMS 

1 Write a detailed proof of equation 3-9-3 

2 Write a detailed proof of equation 3-9-1 

3 Write a detailed proof of equation 3-9-5 

4 Find the value of 2(2r — 3 y), m Table 3-9-1, and check your result by 
equations 3-9-4 and 3-9-3 

5 Compute the value of Ex m Table 3-9-1 

6. Compute the value of (Ex ) 2 in Table 3-9-1, and compare it with the value 
of Ex 2 obtamed in the preceding problem 

7 Find the value of Ex m Table 3-9-2 

10, SYMBOL FOR ARITHMETIC MEAN 

In statistics the term average is used rather loosely to indicate any 
single value which is selected because it is m some way representative of 
its group, and there are several ways of selecting such a representative 
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value- The term arithmetic mean is used to indicate a specific kind of 
average, namely, that which is obtained by adding the variates and 
dividing the sum by the number of variates. The arithmetic mean of a 
variate x is denoted by the symbol x, and is defined by the equation 

x = 2x/N (3-10-1) 

where N is the total number of variates For the data in Table 3-9-1 
there are five vanates, so that x is equal to 275 divided by 5, or 55 In 
Table 3-9-2 there are twenty-five variates, as we see by adding the fre- 
quency column, and x is equal to 250 divided by 25, or 10 
The symbol can be used to indicate the mean of any variate or any 
combination of variates For example, we can obtain the value of x -f y 
by dividing the sum of the fifth column of Table 3-9-1 by 5 In this way 
we see that x + y ~ 65; and, similarly, x — y — 45, xy = 555 6, and 
y 2 =102 

When we need an arithmetic mean of a composite quantity it is some- 
times possible to obtain it by combining the means of the components 
To prove that x + y can be obtamed in this way, let us write it in summa- 
tion form by applying equation 3-10-1 

x + y = Z(x + y)/N 

This sum can be separated by applying equation 3-9-2; 

x + y = Sx/W + 'Zy/N 

and these two sums can be expressed as arithmetic means by applying 
equation 3-10-1, giving us, finally, 

x 4- y = x + y (3-10-2) 

By applying this technique to equations 3-9-3, 3-9-4, and 3-9-5, we obtain 
the following set of rules for manipulating the symbol for arithmetic mean 
The detailed proofs are left to the student. 

x ~ y = x — y (3-10-3) 

Cx = Cx (3-10-4) 

C = C (3-10-5) 

The row labeled 2./N in Table 3-9-1 will show you what these equations 
mean m practice We see that x is 55 and y is 10. From the fifth column 
we see that x + y is 65, as required by equation 3-10-2, and from the 
sixth column we see that x — y is 45, as required by equation 3-10-3. From 
the seventh column we see that 4x is 220, as required by equation 3-10-4; 
and from the fourth column we see that 4 is 4, as required by equation 
3-10-5 
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These four equations will be used frequently .in the derivations of later 
formulas, and they should be studied carefully You may prefer to re- 
member them m verbal form, as follows: 

Equation 3-10-2. The mean of a sum of two variates is the sum of 
their means 

Equation 3-10-3 The mean of the difference of two variates is the 
difference of their means 

Equation 3-10-4. The mean of a constant times a variate is the constant 
times the mean of the variate 

Equation 3-10-5 The mean of a constant is that constant. 

The reader should be cautioned against careless use of the symbol for 
the arithmetic mean, since expressions which look almost alike may have 
totally different values The following specific examples may be noted 

1 The mean of the square of a variate is written x 2 , and the square of the 
mean of a variate is written x 2 } and the two are not m general equal In 
Table 3-9-1, for example, y is 10, and therefore y 2 is 100 The value of y 2 , 
on the other hand, is 102, as we can see from the last column of the table 

2 The mean of the product of x and y is written xy, while the product 
of the means of x and y is written xy Again the two are not m general 
equal In Table 3-9-1, for example, xy is 55 times 10, or 550, while xy 
is 555 6 

PROBLEMS 

1. Write a formal proof of equation 3-10-3 

2. Write a formal proof of equation 3-104 

3 Write a formal proof of equation 3-10-5 

4 Find the value of 2, x + z, and yz in Table 3-9-1 How does yz compare m 
size with yz? 


11. USES OF SUMMATION SYMBOL 

In order for you to follow the derivations of formulas later in the book, 
it will be necessary for you to develop some skill m the manipulation of 
these symbols The purpose of this article is to provide you with an 
opportunity to develop this skill, and at the same time to derive some auxili- 
ary formulas which will be useful m later developments. 

Example 1. If ms a variate which takes on the values 1, 2, 3, 4, and so forth, 
up to N y what is the value of n ? To solve this problem we begin by writing the 
definition of Xn m expanded form as follows 

S?2=l + 2-f~3+ * • + N 

Now let us rearrange these terms m pairs, pairing the first term with the last term, 
the second term with the next to last term, and so forth 

Sn - [1 + N] + [2 + (N - 1)] + [3 + (N - 2)] + 
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Now notice that if N is evep there are N/2 of these pairs, and that each pair has 
the same total value, namely N + 1. We can therefore write the sum m the form 

= (N/2)(N + 1) 

The mean of n is equal to divided by the number of vanates, which is N. 
Upon dividing by N, we obtain 


n = (X + l)/2 (3-11-1) 

For example, the mean of the numbers 1 to 100 is (100 + l)/2, or 50§. The proof 
for odd values of N is left to the student as an exercise 
Example 2 If n is a variate which takes on the values 1, 2, 3, 4, and so forth, 
up to N, what is the mean value of n 2(? This apparently simple problem can be 
solved only by an indirect approach We begin by observing that n — (n— l) 3 = 
3 n — 3 ft + 1, as we can verify by multiplying out the cube on the left Now if 
we sum both sides of this equation we will have 


2[n 3 - (n - l) 3 ] = 2[3 n - 3n + 1] 

The sum on the left can be evaluated by mspection, smce it consists of the senes 
of terms l 3 — 0 3 , 2 3 — l 3 , 3 3 — 2 3 , and so forth, up to N 3 — (N — l) 3 When we 
sum these terms, the last part of each term cancels the first part of the preceding 
term, leaving only N 3 for the sum of the series Inserting this value for the sum of 
the left-hand side, we have 

N 3 = 2(3 n 2 - 3ra + 1) 
or, dividing both sides by N, 

N 2 = 2(3 n 2 - 3n + 1 )/N 

= 3 n - 3 n + 1 (by 3-10-1) 

= 3r? - 3^ + 1 (by 3-10-2) 

= - Bn + 1 (by 3-10-4 and 3-10-5) 

Our objective is to find n 2 , so that we must now solve for this quantity The result 

is 

n 2 = (N 2 + 3n - l)/3 

or, if we insert the value of n from equation 3-1 1-1 and simplify, 

7 = (2 N 2 + 3 N + l)/6 (3-11-2) 

PROBLEMS 

1. Fmd the arithmetic mean of all the numbers from 1 to 11, inclusive. From 
one to 5299, inclusive 

2 Fmd the mean of the squares of the numbers from 1 to 9, inclusive. From 
1 to 45, inclusive 
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3. Derive an equation for n z , using the methods of the preceding proof (Hint 
begin with the expression n — * (n — l) 4 ; expand this expression and sum the re- 
sulting equation.) 

4. Derive equation 3-11-1 for odd values of N. 

12. SUMMARY 

Chapter 3 consists of two primary topics: first, the use of logs and slide 
rules as computing aids; second, the presentation of some basic operational 
rules of mathematics for later use throughout the book. These topics will 
be reviewed separately in the following outline: 

I. Computing Aids 

(1) Logarithms. A common logarithm is defined by the equation 

log A = L if 10 L = A 

The basic laws of logarithms are given by equations 3-2-1, 2, and 3; and 
3-5-1: 

log AB = log A + log B 
log A/B = log A — log B 
log A n _ = n log A 
log S/ A = (1 /n) log A 

The uses of these laws for rapid computation aie illustrated in Articles 
5 and 6 

(2) Slide Rule. All the various manipulations with logs can be 
performed with a slide rule with an accuracy of about one part m a thous- 
and. The operational procedures are summarized m Table 3-8-1. 

II. Basic Mathematical Tools for Statistics 

(1) Manipulations of Exponents. The basic laws are given by 
equations 3-3-1 to 3-3-7 

A X A V = A x+U 
A x /A v =* .l x " v 
(A x ) v = A xv 
A 1 = A 
A 0 - 1 (A ^ 0) 

A" n = 1/A n (A ^ 0) 

A Un = VA 

These equations are illustrated in Article 3. 

(2) Summation Symbol This symbol is defined by equation 3-9-1 

2x = Xi + x 2 + x 3 + • • • + x N 
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or, for a frequency tabulation, 

2% = flXi + f 2 Z 2 + fz%3 + * * + f$%N 

The operational rules governing the use of the symbol are given by equa- 
tions 3-9-2 to 3-9-5: 


S(x + y) = + 'Zy 

S(ar - y) = - 2y 

2(Ca?) = C2:r 

SC = JVC 

(3) Bar Symbol for the Arithmetic Mean. This symbol is defined 
by equation 3-10-1 : 

x = Sz/Af 

The operational rules governing its use are given by equations 3-10-2 to 
3-10-5: 

x + y = dc + y 
x- y = x — y 
Cx = Cx 
C = C 


Examples of the uses of these symbols are given in Article 11. 



CHAPTER 


. 4 . 

ARITHMETIC MEAN AND STANDARD 

DEVIATION 


1. INTRODUCTION 

In Chapter 2 we studied methods by which a large body of data can be 
expressed in a concisely summarized form by means of a frequency tabu- 
lation and a histogram In this chapter we will introduce an alternative 
form of description, which is mathematical instead of graphical In this 
new kind of description, we compute a few key numbers which describe 
or measure various properties of the distribution A complete discussion 
of these numerical measurements will be given in Chapter 7; but two 
of them must be mtroduced now because of the role which they play in 
the development of the theory to come These two are the arithmetic 
mean and the standard deviation 

2. ARITHMETIC MEAN 

The definition of this quantity has already been given in equation 3-9-1, 
which tells us that the arithmetic mean is the sum of all the variates 
divided by the number of variates For ungrouped data, the arithmetic 
mean can be computed directly from this definition, as shown in Article 
9 of the preceding chapter 

For grouped data, we proceed as follows The average value of the 
variates in any smgle class is likely to be m the neighborhood of the class 
mark, and, lacking more exact information, we proceed as if all the variates 
had exactly the value of the class mark We introduce a column headed 
“z,” in which we list the class marks, and a column headed “fx” m which 
each x is multiplied by the frequency of the class, as shown in Table 4-2-1. 
We sum this last column (giving 585 m the example), and divide by N 
(which is 45 m the example, as we see by summing the / column), giving 
585/45 or 13 for the value of x For reference purposes we will describe 
this operation by means of a standard equation: 

x = Xfx/N (for grouped data) (4-2-1) 


64 
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Table 4-2-1 Arithmetic Mean from Definition 



In general, this method of finding x is far too slow for practical com- 
putations and should be used only when the frequency tabulation is very 
short or when the class marks are whole numbers and not very large. 
More rapid methods will be presented m Articles 4 and 5. 

PROBLEMS 

1 Find the mean of the forty temperatures m Table 2-3-2. How does this 
compare with the mean of the same temperatures as obtamed from the original 
data m Table 2-3-1? How do you account for the difference? 

2 Find the arithmetic mean for Table 2-2-2. Would you expect the same mean 
for the data m Table 2-2-1? 


3. STANDARD DEVIATION 

Before defining the standard deviation, let us define an intermediate 
quantity The deviation of a variate is the difference between the vanate 
and the arithmetic mean. It is denoted by the symbol d: 

d = x — x (4-3-1) 

For example, the deviation of the second variate in Table 4-3-1 is 5 minus 
7, or min us 2. The standard deviation is a special kind of average of the 
deviations; it is the square root of the arithmetic mean of the squares of 
the deviations. It is denoted by the Greek letter small sigma: 

<7 = Vd 1 = V(x - xf (4-3-2) 

The notation in this equation is very compact, and it requires careful 
reading The equation tells us that the standard deviation is to be found 
by computing the deviations of all the variates, squarmg each of these 
deviations, addmg these squares, dividing by the number of variates, and 
finding the square root of the result 

The standard deviation can be computed directly from this defining 
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equation for both grouped and ungrouped data. For ungrouped data, 
the procedure is shown in Table 4-3-1 We begin by computing the 


Table 4-3-1 Standard Deviation of Ungrouped Data 


X 

d 

d 2 

rt< 

II 

7 

0 

0 

5 

-2 

4 

N = 6 

2 

-5 

25 

x = 7 

11 

4 

16 

8 

1 

1 

2 d 2 = 50 

9 

2 

4 

7 = 50/6 = 8 33 

42 


50 

<r = V8 33 = 2 89 


arithmetic mean, which we find to be equal to 7 We subtract 7 from 
each of the variates in the x column (obtaining a negative result whenever 
the value ot x is less than 7) and enter the results in the column headed d 
We then square each value of d and enter the results in the column headed 
d 2 We next sum this column (obtaining 50 m the example) and then divide 
the result by the number of variates (6), which gives us d 2 (8 33 in the 
example). We then take the square root of d 2 , obtaining 2 89, which is the 
standard deviation. 

For grouped data, the procedure is shown in Table 4-3-2 We begin by 
finding x in the usual way (in this case x is 13, as we found from Table 


Table 4-3-2. Standard Deviation for Grouped Data 


X 

/ 

d 

d 2 

fd 2 

2 d 2 

= 576 

5 

1 

— 8 

64 

64 



8 

7 

-5 

25 

175 

N 

= 45 

11 

12 

-2 

4 

48 



14 

14 

1 

1 

14 

7 

= 576/45 = 12 8 

17 

8 

4 

16 

128 



20 

3 

7 

49 

147 

<7 

= y/ 12.8 = 3.58 


45 



576 




4-2-1), and we then subtract this value of x from each value of x and enter 
the result under the d column We square each of these values of d, and 
multiply each such square by /, and enter the result in the fd 2 column. 
We next sum this column (obtaining 576 in the example), and divide by 
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N (which is 45) . This gives us 12 8 for d 2 . Finally we take the square root 
of this, giving us 3 58 for a 

This direct method is not to be recommended for general use It is 
included here to familial ize you with the basic definition of the standard 
deviation, but unless the set of data is particularly simple, more rapid 
methods are to be preferred 

It might appear to some readers that we are wasting time in first squar- 
ing the deviations, averaging them, and then taking the square root, 
thereby undoing what we did m the first place A little reflection, however, 
will show that this is not quite the same as simply averaging the devia- 
tions In Table 4-3-1, for example, the absolute* deviations are 0, 2, 5, 4, 
1, and 2, and the mean of these is 2 33, which is a little smaller than the 
standard deviation The standard deviation is a kind of average of the 
deviations, but it is an average in which the large deviations are given 
somewhat more weight than the smaller ones. 

PROBLEMS 

1 Compute the standard deviation ot the following set of numbers 14, 17, 
12, 10, 19, and 18 

2 Compute the standard deviation of the folio wmg data. 

x f 

13-15 2 

16-18 5 

19-21 9 

22-24 5 

25-27 2 

4. RAPID METHODS FOR UNGROUPED DATA 

The methods in the preceding sections would be excessively lengthy 
if applied to most of the bodies of data encountered in practical problems. 
The material with which a statistician deals usually consists of far more 
variates than do the illustrative problems used m this book, and the 
variates themselves are likely to be more complicated numbers In this 
and the following section we will develop rapid methods for computing 
the mean and the standard deviation. 

The variates, or the class marks, are usually rather simple numbers, 
while the arithmetic mean is usually a lengthy decimal, and therefore so 
is each of the deviations The computations can be shortened, therefore, 
if we avoid the use of the deviations and werk directly with the variates 
themselves To see how this is possible, let us square out the right-hand 
side of equation 4-3-2, as follows* 

cr = ^ (x — x) 2 = V (x — 2 xx + X 2 ) 

*The absolute value of any number is its numerical value with the plus or minus 
sign ignored 
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Now, by equation 3-10-2, we can rewrite the mean of the sum as the 
sum of the means: 

cr = y/ x 2 — 2xx + x 2 

In the middle term of this equation, both 2 and x are constants, and equa- 
tion 3-10-4 tells us that they can be written in front of the averaging sign. 

cr = VV - 2xx + x 2 

The middle term is now seen to be simply 2x 2 , and it can be combined with 
the right-hand term: 

<y — y/ ' x — x (4-4-1) 

In words, this equation tells us that the standard deviation is the square 
root of the mean of the squares of the variates minus the square of the 
mean of the variates Its use is demonstrated in Table 4-4-1, which you 
will find to be self-explanatory 


Table 4-4-1 Use of Equation a — y/x 2 — x 2 



2 


X 

X 


2 

4 


9 

81 

X 2 = 320/10 = 32 

7 

49 


8 

64 

x = 50/10 = 5 

3 

9 


4 

16 

x 2 = 5 2 = 25 

8 

64 


5 

25 

<r = V32 - 25 = 2 65 

2 

4 


2 

4 


50 

320 



Another possibility for saving time arises when the variates are large 
numbers which do not differ very much from one another, such as those 
in the left-hand column of Table 4-4-2. In such cases a great deal of time 
can be saved by first subtracting a fixed number from all of the variates 
m order to reduce them to a more manageable size Let x 0 stand for such 
a fixed number, or zero point , chosen at the investigator’s convenience 
Our objective is then to express x and cr in terms of x — x 0 rather than in 
terms of x alone To do so, let us observe that x — x 0 + (x — x 0 ), and 
then let us take the mean of both sides of this equation. The result is: 


x = x 0 + (x — x 0 ) 



ART. 4] 


ARITHMETIC MEAN AND STANDARD DEVIATION 


69 


The right-hand side of this equation can be separated into two means by 
applying equation 3-10-2, giving us 

x = x Q + (x — Xq) (4-4-2) 

In words, this equation tells us that the mean of any variate is the mean 
of its deviations from any arbitrary zero point, plus that zero pomt Its 
use is illustrated in the left-hand side of Table 4-4-2, m which 10842 0 has 


Table 4-4-2 Change of Zero Point 


Arithmetic Mean 


X 

X — Xq 

10842 3 

0 3 

10842 7 

0 7 

10843 1 

1 1 

10842 1 

0 1 

10842 8 

0 8 

10842 2 

0 2 


3 2 

Xq = 

10842.0 

X “ Xq = 

3 20/6 = 0.533 

X = 

10842.0 + 0 533 

=s 

10842 533 


Standard Deviation 
(* - *o) 2 

0 09 

0 49 

1 21 
0 01 
0 64 
0 04 

2.48 

(x - x 0 ) 2 = 2 48/6 = 0 413 
(x - So) 2 = o 5 33 2 = 0 284 
a = V0 413 - 0 284 

= 0 359 


been chosen for the zero point The mean of the deviations from this 
starting point is found to be 0 533, and the mean of the original numbers 
is therefore 10842 0 plus 0.553, or 10842 533 

To see how the standard deviation can be computed in terms of these 
deviations from the zero pomt, let us express x 2 and x 2 m terms of £ — 
x 0 , and insert the results m equation 4-4-1 Starting with x 2 } we have 

x 2 = [(a; — x Q ) + x 0 ] 2 = (x — x 0 ) 2 + 2(x — x 0 )x 0 + xl 
By applying equation 3-10-2 we can separate this into three terms; 
x 2 — (x — x 0 ) 2 + 2(x — x 0 )x 0 + %o 

and since 2, x 0 , and x 2 0 are all constants, we can apply equations 3-10-4 
and 3-10-5, giving us 

X 2 = (x — Xq ) 2 + 2x 0 (x — X 0 ) + xl 

Now let us leave this result for a moment and find the value of a?: 

X = {x — Xq) + Xq — (x — X 0 + Xq ) 2 

or x 2 = x — x 0 2 + 2x 0 (x — Xq) + xl 
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Inserting these values for x 2 and x 2 m equation 4-4-1 and simplifying, we 
have 

(X = V (x — Xo ) 2 — ( X ~ x 0 ) 2 (4-4-3) 

The use of this equation is demonstrated in the right-hand column of 
Table 4-4-2 

PROBLEMS 

1 Find the standard deviation of the numbers m Table 4-3-1, usmg the method 
of equation 4-4-1 

2 Find the arithmetic mean and the standard deviation of the folio wmg set of 
numbers 49638, 49644, 49632, 49637, 49641, 49640, 49639, 49645, 49631, and 
49644 Which of the equations for x and a are most suitable for this problem? 

5. RAPID METHODS FOR GROUPED DATA 

If the data aie in the form of a frequency tabulation, there is a further 
way in which we can effect a saving of time Since each class mark is 
equal to the first class mark plus a multiple of the class interval, we can 
take out the class interval as a factor throughout and simplify the compu- 
tations Let us mtroduce the following notation 

x 0 = the class mark of any convenient class, usually the\ 
one containing the largest number of variates. r 
u = the serial number of any class, starting with u — 0 / (4-5-1) 

for the class labeled x 0 and increasing with increas-l 
mg values of £ / 

An example of a choice of x 0 and an assignment of a set of values of u is 
shown in Table 4-5-1 Note that the values of u increase downward, 
following the values of x, and that negative values of u are assigned to 
classes with class marks smaller than x 0 

Our working equations have until now been expressed m terms of x, 
and we now wish to express them m terms of u The first step is to write 
in mathematical form the relationship between the old and the new 
variables This relationship can be obtained from an inspection of Table 
4-5-1 , it is 

x = Xo + Cu (4-5-2) 

where C is the class interval, described in Chapter 2, Article 3 For ex- 
ample, x for the last class is 549 5 plus 20 times two, and x for the first 
class is 549 5 plus 20 times —3 If we wish to obtain the value of u cor- 
responding to any given value of x, we can rewrite the above equation 
in the form 


u = (x — x 0 )/C 


(4-5-3) 
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Table 4-5-1 Rapid Method for Grouped Data 


Limits 

X 

f 

u 

fu 

fu 

480-499 

489 5 

3 

— 3 

-9 

27* 

500-519 

509 5 

11 

-2 

— 22 

44 

520-539 

529 5 

21 

-1 

-21 

21 

540-559 

549 5 ( = x 0 ) 

27 

0 

0 

0 

560-579 

569.5 

14 

1 

14 

14 

580-599 

589 5 

5 

2 

10 

20 



81 


-28 

126 



(AO 


(2«) 

(2« 2 ) 


Computations 






N = 81, C 

= 20; 

Xq = 549 5 



«_ = -28/81 = -0 346 
u 2 = 126/81 = 1 56 
x = 549.5 + 20( — 0 346) = 542 6 
cr = 20 Vl 56 - (-0 346) 2 
= 20Vl 56 - 0 12 = 24 0 


*This column is most qmckly computed by multiplying u by fu 

To express the arithmetic mean m terms of u and x 0 , let us start with 
equation 4-4-2, and in it replace x — x 0 by Cu y according to equation 4-5-3 . 

x = x 0 + x — Xq = x 0 + Cu 

Smce C is a constant, we can move it outside of the bar symbol for the 
arithmetic mean, by equation 3-10-4 

x = x 0 + Cu (4-5-4) 

For the standard deviation, we begm with equation 4-4-3 and again re- 
place x — x 0 by Cu 

a = V(rr — x 0 ) 2 — (x — T 0 ) 2 = ^(Cu ) 2 — (Cu ) 2 
Smce C is a constant we can again apply 3-10-4 

<r = Vcv - CV = VC’V - u 2 ) 


or, finally, 


O- = cV w 2 - 


(4-5-5) 



72 INTRODUCTION TO THE THEORY OF STATISTICS [CH. 4 

If we use u u to denote the standard deviation in u, then by equation 
4-4-1, 

<j u = VV 2 — u 2 (4-5-6) 

With this, equation 4-5-5 becomes simply 

= C<j u (4-5-7) 

where we have used a x to distinguish the standard deviation in x units 
from that m u units The uses of these equations are demonstrated m 
Table 4-5-1, m which all the details of the computations are shown These 
equations provide a very rapid method of obtaining the arithmetic mean 
and the standard deviation, even when the frequency tabulation is very 
lengthy 


PROBLEMS 

Use equations 4-5-4 and 4-5-5 m all cases It is suggested that m each problem 
you leave room for two additional columns at the right of the/w 2 column, for further 
computations upon these data to be descnbed m future chapters 

1. Find the arithmetic means and the standard deviations of the two distribu- 
tions of temperatures which you obtained m Problem 1, Article 3, Chapter 2 
Does this additional information clarify the nature of the difference between the 
two distributions ? 

2 Fmd the arithmetic mean and the standard deviation of the distribution m 
Table 4-2-1 

3. Fmd the arithmetic mean and the standard deviation of the examination 
scores m Table 2-6-1 

4. Fmd the arithmetic mean and the standard deviation of the data which you 
gathered for Problem 4, Article 4, Chapter 2 

5 Fmd the arithmetic mean and the standard deviation of the ages of the 
dementia praecox patients m Table 2-5-1 

6. SUMMARY 

The arithmetic mean (x) and the standard deviation (a) are useful for 
describing any distribution They are also useful as intermediate quantities 
to be used m further analyses of the data by methods to be described later 
in the book 

The arithmetic mean is defined by equation 3-10-1 : 

x — Hx/N 

or, for a frequency tabulation, 


x = Xfx/N 

and the standard deviation is defined by equation 4-3-2: 


<t = y/(x — x) 2 
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In practice it is usually inconvenient to compute these quantities directly 
from their definitions, and the following rapid methods should be used 
instead : 

I. For distributions consistmg of relatively few ungrouped small variates, 
use equations 3-10-1 and 4-4-4 

x = IZx/N 

-2 

<7 = v X — X 

The procedure is demonstrated m Table 4-4-1. 

II For distributions consisting of relatively few ungrouped large 
variates, use equations 4-4-2 and 4-4-3 * 


x = x 0 + x — x Q 

cr = V^(x — x 0 ) 2 — (x — x 0 ) 3 

where x 0 is any convenient zero point. The procedure is demonstrated in 
Table 4-4-2. 

Ill For distributions consistmg of grouped data, use equations 4-5-4 
and 4-5-5 * 


x = x 0 + Cu 

<r = C’Vt? - m 2 


where x 0 is the class mark of any class chosen at the convenience of the 
investigator (usually the largest class is chosen) and u is the serial number 
of any class, starting with u = 0 for the class labeled x 0 , and increasing 
with mcreasmg values of x The method is demonstrated in Table 4-5-1. 
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PROBABILITY 


1. INTRODUCTION 

The concept of probability is widely used and understood We do not 
need a formal definition to understand the meaning of the statement that 
the probability of a snowfall in March is smaller than the probability of a 
snowfall in January, or that the probability that a man will be killed by a 
traffic accident is greater than the probability that he will be killed by 
lightning We all indicate our opinions about the degree of probability of 
various events, by means of such expressions as “His chances of being 
elected are very small,” or “The odds are against us,” or even “The train 
has probably arrived by now ” 

Statements such as these convey useful mformation, but it is obvious 
that they would be much more useful if they could be expressed in quanti- 
tative terms For example, a surgeon may tell a man who has undergone 
a cancer operation that there will “probably not be a recurrence.” But 
the word “probably” covers a variety of meanings, and if the surgeon can 
instead make a quantitative statement which describes the degree of 
probability, the patient will have a much more reliable basis for planning 
his life. Our first task is to define the concept of probability in an exact 
numerical way 


2. PROBABILITY DEFINED 

If we let 5 be the number of ways m which a given event E can succeed, 
and / the number of ways in which it can fail, then the probability that 
E will occur is 

P(E) = 8/(8 + /) (5-2-1) 

and the probability that E will not occur is* 

P(not E) = //($ + /) (5-2-2) 


*These abbreviations for probabilities of events are customarily read U P of E” and 
“P of not E ” 
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We see from these definitions that 

P(: not E) = 1 - P(jE) (5-2-3) 

To illustrate these equations, let us compute the probability that a card 
drawn at random from a standard deck will be a spade Since there are 
thirteen spades in the deck, the event can occur m thirteen ways, and s 
is 13 There are thirty-nine non-spades in the deck, so that the event can 
fail to occur in thirty-nine ways, and / is 39 The probability that the 
card will be a spade is therefore 13. (13 -f 39), or 1/4, or 0 25, and the 
probability that it will not be a spade is 39/ (13 + 39), or 3/4, or 0 75. 

It is obvious from the definition that the piobability scale ranges from 
zero to one, zero is the probability of an impossible event , and one is the 
probability of a certain event 1 and all other probabilities lie between these 
limits If the probability is 1/2, or 0 50, then the event is exactly as 
likely to succeed as to fail 

In everyday language, it is customary to describe probabilities by statmg 
the ratio of the favorable to the unfavorable cases, that is, by the ratio of 
s to f “The chances are three to one m his favor” becomes, in our terms, 
“The probability that he will succeed is 3/4.” 

In practice the method of finding s and / must depend upon the individ- 
ual problem For example, let us compute the probability that when two 
dice are thrown, the sum of the two numbers will be 7 The answer is 
obtained as follows. The first die can fall m any of six w T ays, and with 
each of these the second can fall m any of six ways, so that there are thirty- 
six ways in which the pair can fall, and s + / is therefore 36 To find s, 
we must list and count the pairs which total 7, they are 1 and 6, 2 and 5, 
3 and 4, 4 and 3, 5 and 2, and 6 and 1, where the first number of each 
pair refers to the first die and the second number to the second die. We 
see that 5 equals 6, and the required probability is therefore 6/36, or 
0 167 

PROBLEMS 

1. If a card is drawn at random from a standard deck, what is the probability 
that it will be (a) a heart? (b) A black card? (c) Smaller than a 5? 

2 A girl who is 5'8" tall is offered a blind date with one of her roommate’s 
brothers If there are three brothers, and they are 5'T", o'lO", and 5'H" in height, 
what is the probability that the girl will be taller than her escort? That she will 
be shorter? 

3. If two dice are thrown, what is the probability that (a) their sum will be 2? 
(b) That their sum will be 5? (c) What is the most likely sum? 

3. DISCUSSION OF DEFINITION 

The fundamental definition of probability given in 5-2-1 is simple and 
easy to use in most cases, but there are some subtleties of reasoning in 
its application which are not apparent at first glance The following 
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remarks may help you to avoid ambiguities m applying the definition to 
practical problems 

(1) In counting the number of ways in which an event may happen, it 
is necessary to remember that these ways of happening must be equally 
likely * If we ignore this requirement we can arrive at absurdities like 
the following A given group of hospitalized soldiers included 105 patients 
suffering from malaria, 247 from typhus, 5 from pneumonia, 23 from 
gunshot wounds, and 2 from poison gas illness Since there are three ways 
in which a soldier in the group might be ill fiom natural causes, and two 
ways in which he might be ill as a result of enemy action, it follow that 
the probability that a given soldier is ill from natural causes is 3/5, or 
0 60CM This result would of course be justified only in the event that all 
five types of illness were equally likely 

(2) A little reflection will show that a statement about probability is 
not a statement about a physical situation, but is instead a statement 
about a particular observer’s knowledge of the situation A given event 
can have one probability relative to the information m the possession of 
one observer, and a totally different probability relative to the information 
in the possession of another observer To clarify this statement, let us 
picture a specific experiment 

Imagine that a deck of cards is shuffled and placed on a table, and that 
the top card is then removed and laid to one side Three observers, A, B , 
and C are asked to state the probability that this card which was removed 
is a spade Observer A , who has seen none of the cards, replies that the 
probability is 0 25 Observer B is permitted to examine the four bottom 
cards of the deck before giving his answer He sees that these four cards 
include three spades and a diamond, so that from his pomt of view there 
are only 48 unknown cards left, of which 10 are spades and 38 are non- 
spades. He replies that the probability that the top card is a spade is 
10/48, or 0 208 appioximately Observer C, however, has caught a glimpse 
of the top card as it was being removed, and he has seen that it was m fact 
the ten of diamonds Relative to his information the probability that it 
is a spade is of course zero* The physical arrangement of the cards is 
unchanged; yet three different statements about the probability are all 
correct, each relative to a different body of information 

A statement about probabilities implies possession of incomplete infor- 
mation about a situation A “correct” statement is simply a statement 
of the best possible guess in the light of this incomplete mformation 

PROBLEMS 

1 Criticize the following computation A given lake contains bluegills, black 
bass, white bass, perch, smallmouth bass, and catfish Since there are six kinds 

*We are thus forced to use the concept of equal probability m defining probability, 
and the definition is therefore partly circular 
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of fish, of which three kinds are bass, the probability that a fish caught at random 
will be a bass is 0 50. 

2 In a rummy game, two players are dealt the following hands: 

First Player Second Player 


Spades A, 10, 9 K 

Clubs 10 2,3,4,J 

Diamonds 7,3 J 

Hearts K 9 


What is the probability that the first card drawn from the remainder of the deck 
will be a club? Give three answers, relative to the information m the possession 
of (a) the first player, (b) the second player, and (c) a spectator who sees both 
hands Which of the three probabilities do you think is most reliable? 

3. A die is so weighted that a five turns up twice as often as any other face 
(a) What is the probability that a five will turn up, relative to the information 
given above? (b) What is the probability for a man who does not know that the 
die is weighted? (c) What is it for a man who knows that the die is weighted, but 
does not know which face the weighting favors? 

4 Compute the correct probability m the first illustrative example (about 
hospitalized soldiers) m the foregoing article 

4 . EMPIRICAL PROBABILITIES 

In practice, it is frequently more convenient to look upon the probability 
of an event as simply the percentage of cases in which the event can be 
expected to occur It is then not necessary to enumerate the ways in 
which the event can succeed or fail, we can instead estimate the probabihty 
on the basis of past experience In the absence of other information, the 
best estimate of the probability is simply the percentage of cases in which 
the event has occurred in the past If, for example, we are asked the 
probabihty that a card drawn from an incomplete deck will be a spade 
and if we know that in previous experiments four cards out of ten drawn 
from this deck have turned out to be spades, then the best estimate of the 
probability that any card drawn in the future will be a spade is 0 40 

Probabilities which are estimated in this way are sometimes called 
“empirical probabilities ?? Empmcal probabilities are widely used in 
life insurance work and m other applied fields For instance, if we wish 
to estimate the probability that a boy vrho is now 10 years old will still 
be alive at age 30, w r e have only to consult the American Experience Mor- 
tality Tables (Appendix III in this volume), which shows us that out of 
every 100,000 Americans who are alive at the age of 10, 85,441 have sur- 
vived until age 30 The probability that a given 10-year-old boy will 
survive until age 30 is therefore 0 85 In the same way, we see that of 
92,637 people who are alive at age 20, 57,917 will be alive at age 60, and 
the probability that any given 20-year-old wall survive until 60 is there- 
fore 57,917/92,637 or 0 63. 
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It is perhaps worth emphasizing again the relationship between proba- 
bility and knowledge. We saw m the preceding paragraph that the 
probability that a given 20-year-old will survive to the age of 60 is 0 63 
This is the correct probability if we know only that he is American and 
20 years old If we know also that he is 54 pounds overweight, or that 
both of his parents lived beyond the age of 90, then the given probability 
would no longer be the correct one, and we would instead have to base the 
probability upon tables containing data about people comparable to 
himself The most reliable probabilities are of course those which are 
based upon the most information It is worth noting, for example, that 
life insurance companies find it worth while to collect a great deal of 
medical information about their clients before granting them insurance 

PROBLEMS 

Problems 1, 2, and 3 should be solved with the aid of the American Experience 
Mortality Table given m Appendix III 

1 What is the probability that a man who is now 20 will still be alive at 40? 
At 60? At 80? At 95? 

2 What is the probability that a man who is now 93 will survive at least one 
more year? Two more years? 

3. What is the age which a man of 20 has a 0 5 chance of reachmg? 

4. TJsmg the data in Table 2-2-2, find the probability that a wire chosen at 
random will have a breaking strength of (a) 207 pounds, (b) more than 204 pounds 

5. MATHEMATICAL EXPECTATION 

The concept of probability can be illustrated very clearly in terms of 
the related concept of expectation, which is a measurement of the cash 
value of an uncompleted gambling operation. The expectation (Exp.) of 
a given venture is defined as the probability (P) that it will succeed, times 
the gain ( G ) which will result if it does succeed. 

Exp = PX(? (5-5-1) 

If, for example, a man holds five lottery tickets out of a total of 100, 
and the prize is to be worth $200, then his expectation is $200 times 0 05, 
or $10 If the holder of the tickets should decide to sell his tickets before 
the drawing is held, this is the value which should be placed upon them 
In general, the decision about the advisability of entering any venture 
should depend upon a balancing of the cash cost of the venture against the 
expectation which is being purchased. If a poker player plans to call a bet 
of $1, which would increase the pot to $6, then his probability of winning 
should be at least 1/6 in order to justify his calling the bet; otherwise he 
will be spending his dollar for less than a dollar’s worth of expectation and 
will in the long run lose. 

A slightly different situation arises in the purchase of insurance. We 
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can see from Appendix III that the probability that a man who is now 20 
will die within a year is approximately 0 008. If such a man owns a life 
insurance contract for $1000, covering him for one year, then his expecta- 
tion is obviously about $8 Since he must pay more than this for the 
contract, he is accepting a statistical loss on the transaction The pur- 
chasers of insurance are willing to accept such a loss because the primary 
purpose of insurance is not to gain on the venture but to distribute the 
financial burden caused by individual catastrophes. 

PROBLEMS 

1 If a 30-year-old man has a one-year life insurance contract which will pay 
$5000 m case of his death, what is his expectation? What is his expectation if he 
is 90 years old? 

2 The merchants m a given city promote Christmas shopping each year by 
giving away numbered lottery tickets with each purchase At a later drawing, the 
holder of the winning ticket receives an automobile worth about $1800. The ^ 
serial numbers on tickets issued late m the distribution period are larger than 
700,000 About what is the expectation for the holder of a single ticket? If you 
were offered a block of 100 tickets for 50^ just before the drawing, should you 
accept the offer? 

3 A nursery operator observes that a gi\ en kind of seedlings survive to salable 
size about two times out of three They sell for $2 apiece If he plants 100 such 
seedlings, what is his total expectation? 

6. INDEPENDENT EVENTS 

When we are discussing the probabilities of several events, it is neces- 
sary to use a notation which distinguishes between them If w r e are 
discussing two events, called event A and event B, then it will be con- 
venient to let P(A) mean the probability of event A, and P(B) the prob- 
ability of event B P(A and B) will then be understood to mean the 
probability that both events will happen, and P(A or B) the probability 
that one or the other will happen It is the objective of this and the 
following sections to show how these composite probabilities can be com- 
puted from the probabilities of the separate events 

Let us first consider the probability that both of two events will occur 
For clarity, we must make the following distinction Two events are 
said to be dependent if the outcome of the first has an effect upon the 
probability that the second will occur, and independent if the outcome of 
the first has no effect upon the second. If, for example, we draw tw T o cards 
from a deck and ask for the probability that both are spades, w^e must 
know whether or not the first card was replaced m the deck before the 
second card was drawm If it was replaced, then the probability that the 
second card will be a spade is 1/4, whether the first card was a spade or 
not, and the two events are therefore independent If, on the other hand, 
the first card was not replaced, then the probability that the second card 
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will be a spade is either 12/51 or 13/51, depending upon whether the first 
card was a spade or not, and the two events are therefore dependent 

To find the probability that both of two independent events will occur, 
we must count the ways m which the composite event can succeed and the 
ways m which it can fail The first event can happen m any of + f x ways, 
and following any one of these the second event can happen in any of s 2 + 
/ 2 ways, so that the total number of ways in which the pair of events can 
take place is s l + f l times s 2 + f 2 By the same argument, the first event 
can succeed m any of s l ways, and following any of these the second event 
can succeed in any of s 2 ways, making a total of s 1 times s 2 ways m which 
both of the events can succeed together The probability that both will 
succeed is therefore 



($1 + / l )($2 + f 2 ) 

This can be written in the form 

. . X/ S 2 

($1 + fl) ( S 2 + f 2 ) 

or simply P(A) times P(B) We therefore have the following result* 
The probability that both of two independent events will occur is the 
product of their separate probabilities For reference we will write this* 

P(A and B) = P(A) X P(B) (A and B independent) (5-6-1) 

This can obviously be extended to any number of events 

To illustrate this equation, let us compute the probability that a 4 will 
turn up on all three successive throws of a die The probability of a 4 
on each throw is 1/6, and smce the events are independent, the probability 
that a four will turn up on all the throws is (1/6) X (1/6) X (1/6), or 
1/216, or 0 0046 

PROBLEMS 

1. Three construction firms are bidding for a contract On the basis of past 
expenence, the probability that A’s bid will be higher than B’s is 0 60, and the 
probability that C's will be higher than B’s is 0 40 What is the probability that 
B will wm the contract with the lowest bid? 

2 If two dice are thrown, what is the probability that both will come up 5’s? 
(Use equation 5-6-1 ) 

3 A student wishes to borrow $5 from his roommate He knows that the 
roommate has exactly $5 m his possession, but the probability is 0 3 that the 
roommate will spend part of it for a movie during the day, and 0 4 that he will 
lend part of the $5 to another roommate What is the probability that the $5 will 
remain intact? 

4 On a particularly dangerous bombing mission, it is estimated that one-third 
of the planes will be lost What is the probability that a given plane will survive 
three such missions? 
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7. DEPENDENT EVENTS 

The argument used for the derivation of equation 5-6-1 can be used, 
with a slight modification, for the case m which the two events are de- 
pendent In this case the total number of wa 3 r s m which the pair of events 
can occur is s 1 + f 1 times s 2 + f 2 , as before, but the total number of ways 
m which both can succeed must be reconsidered The first of the two 
events can succeed in ways, and following any one of these there will 
be s 2 ways in which the second can succeed, and there are therefore s x times 
$ 2 ways in which both will succeed. But s 2 must now obviously mean the 
number of ways that the second event can succeed after the first event 
has already succeeded If, for example, we draw two cards from a deck 
and ask for the probability that both will be spades, then there are thirteen 
ways m which the first card can be a spade, but, corresponding to each of 
these, there are only twelve ways m which the second card can also be a 
spade The derivation of 5-6-1 is obviously still valid if this change of* 
meaning is kept in mind We can state this result as follows The prob- 
ability that both of two dependent events will occur is equal to the prob- 
ability that the first will occur multiplied by the probability that the 
second will occur, the latter probability being computed on the assumption 
that the first has already occurred For reference we will put this in 
equation form: 

P(A and B) = P(A ) X P(B if A has occurred) 

( A and B dependent) (5-7-1) 

This can obviously be extended to any number of events For example, 
if we are to draw three cards from a deck and wish to know the probability 
that all three will be spades, then we proceed as follows The probability 
that the first card will be a spade is 1/4 If the first card is a spade, then the 
probability that the second card will also be a spade is 12/51. If both of 
these are spades, the probability that the third -will also be a spade is 11/50 
The probability that all will be spades is the product of these three prob- 
abilities, or 11/850, or approximately 0 0129 Similarly, the probability 
that the first will be a spade and the other two will be diamonds is 1/4 
times 13/51 times 12/50, or approximately 0 0153 

PROBLEMS 

1 If two cards are drawn from a deck, what is the probability that both will be 
diamonds? That the first will be a diamond and the second a club? That neither 
will be diamonds? 

2 The probability that a given candidate will be nommated is 0 3, and the 
probability that he will be elected if nommated is 0 4 What is the probability 
that he will hold the office? 

3. In a given medical school, 28 per cent of the students drop out m their first 
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year, 13 per cent of those remaining drop out m their second year, and 9 per cent 
of the remainder drop out before graduating What is the probability that a 
given entering student will graduate? 

4 The probability that a given bombing plane will make a successful flight 
to the target area is 0 9, the probability that the navigator will locate the target 
is 0 7, and the probability that the bombardier will score a hit is 0.4. When the 
plane takes off, what is the probability that the target will be hit? 

8. MUTUALLY EXCLUSIVE EVENTS 

If two events are alternative ways m which a single operation can turn 
out, that is, if the occurrence of one event means that the other event 
cannot occur, then the two events are said to be mutually exclusive If, 
for example, we draw a single card from a deck and ask for the probability 
that it will be either a spade or a club, then these two possibilities are 
mutually exclusive, if it is a spade it cannot be a club, and vice versa In 
this article we will discuss the probability that either one or the other 
of two mutually exclusive events will occur 

The number of ways m which the first event can occur is s u and the 
number of ways in which the second event can occur is s 2 Since the two 
events are mutually exclusive, there is no overlap between these two sets 
of ways, and the total numbei of ways in which one or the other can occur 
is simply s x + s 2 The probability that either one or the other will occur 
is therefore (s 1 + s 2 )Ab where n is the total number of ways in which the 
event can turn out We can rewrite this m the form s l /n plus s 2 /n In 
other words, the probability that either one or the other of two mutually 
exclusive events will occur is the sum of their separate probabilities For 
reference purposes, we will write this in equation form: 

P(A or B) = P(A ) + P(B) (A and B mutually exclusive) (5-8-1) 

For example, the probability that a single card drawn from a deck will 
be either a spade or a club is 1/4 plus 1/4, or 0 50. 

PROBLEMS 

1. If a die is thrown, what is the probability that either a 5 or 6 will come up? 

2. If a card is drawn from a standard deck, what is the probability that it will be 
either a spade or a diamond? 

3 If two dice are thrown, what is the probability that the two faces will not 
total 5? (Use equation 5-2-3 ) 


9. PERMUTATIONS 

In problems concemmg the probability that several events will all occur, 
we can either multiply together the separate probabilities, according to 
equation 5-6-1, or we can list and count the number of ways m which the 
composite event can succeed or fail, and use equation 5-2-1 If we make 
the latter choice, the enumeration of the separate possibilities may become 



ART. 9} 


PROBABILITY 


very lengthy If we can instead set up formulas by means of which the 
number of possible ways can be computed, then this listing and counting 
'will not be necessary. One such formula will be derived in this article. 

A permutation is an arrangement or a sequence of a number of objects. 
There are, for example, six permutations of the letters A , B, and C ; they 
are ABC, ACB , BAC, BCA, CAB, and CBA Instead of listing and 
counting these six permutations, we could have deduced their number as 
follows Each permutation is to contam three letters To obtain any one 
permutation we must select a letter to fill the first place, and then, from 
the remaining letters, we must select one to fill the second place, and so 
forth. There are three ways to fill the first place (with an A, B, or C), 
and corresponding to each of these there are two ways to fill the second 
(if, for example, we fill the first space with B, then we can fill the second 
with either i or C). Thus there are three times two ways of filling the 
first two spaces, and, corresponding to each of these, there is only one way 
to fill the third space The total number of permutations is therefore 
3 X 2 X 1, or 6 It is convenient to introduce the following abbreviation: 


n! = 1 X 2 X 3 X 4 X 5 ton (5-9-1) 

This is read u n factorial ” For example, “3 factorial/ 7 or 3 is 1 X 2 X 3, 
or 6, and 4! is 24 Using this notation, we can generalize the results of the 
precedmg paragraph by the statement that the number of permutations 
of n objects is n* 

In statistics, we must frequently compute the number of permutations 
which can be made up from n objects when only r of them are used in any 
given permutation We will call this “the number of permutations of n 
objects taken r at a time/ 7 and abbreviate it Perm(n,r) For example, 
the number of permutations of 4 objects taken two at a time is 12, as we 
can readily see by listing the possible two letter words which we can form 
from the letters A, B, C, and D To obtain a general equation for the 
number of permutations of n objects taken r at a time, let us picture the 
process of forming a specific permutation of r letters from a pool con- 
taining n letters We can choose, for the first of the r letters, any letter in 
the pool; that is, we can fill the first space in any of n different ways 
Whichever one we choose, there will be only n — 1 letters remaining m the 
pool, and we can therefoie fill the second space m any one of n — 1 ways 
There are therefore nX(n — 1) ways of filling the first two spaces, or 
nX (n — 1) X (n — 2) ways of filling the first three, or finally, n X (n — 1) 
X (n — 2) * (n — r + 1) ways of filling all r places. 

This result can be written more simply if we multiply it by (n — r) l 
and then divide it by the same amount The effect of the multiplication 
is to supply the factors from n — r down to one. 


Perm(?i, r) 


[n{n — l)(?z — 2) ■ • (n r + 1)](^ ~ r) T 
(n — r) * 
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or, combining the factors in the numerator* 

Yl I 

Perm(n,r) = (6-9-2) 

For the special case of n objects taken all at a time, we must use the result 
obtained earlier, which we can now formulate as follows 

Perm(n,7i) = ft* (5-9-3) 

For example, Perm(5,2) is (5X4X3X2X1)/(3X2X1), or 20 It is sug- 
gested that you list the two-letter permutations to be made up from the 
letters A, B , C, D, and E and verify this conclusion. 

To illustrate the use of these formulas for probability problems, let us 
compute the probability that a given player in a bridge game will receive a 
hand consisting of thirteen spades To find this we may proceed as follows 
There are Perm(52,13), possible different bridge hands (countmg each 
permutation as a different hand), so that s + f in equation 5-2-1 is Perm 
(52,13) Included among these there are Perm(13,13) hands which con- 
sist of all spades, and s is therefore Perm(13,13) The required probability 
is Perm(13,13)/Perm(52,13), or 

13 1 

521/39* 

or approximately 0 0000000000016 If we imagine that an inveterate 
bridge player plays thirty hands per evening, 365 days per year, then we 
can readily compute that he should expect an all-spade hand about once 
in every sixty million years ! 

The problem m the preceding paragraph is given to illustrate the use of 
permutations in probability computations, but it would have been possible 
to obtam the same result by means of equation 5-7-1, as follows The 
probability that the first card of the thirteen will be a spade is 13/52 On 
the assumption that the first card was a spade, the probability that the 
second caid will also be a spade is 12/51 Continuing, we see that the 
probability that all thirteen will be spades is (13/52) X (12/51) X (11/50) 

* • * X (2/41) X (1/40), which gives us the same result as before 

PROBLEMS 

1 Work out the details of the computation of the probability that a random 
bridge hand will consist of thirteen spades (It is suggested that you collect 
powers of 10 as suggested m Article 8 of Chapter 3 ) 

2 A hostess is planning a luncheon with eight guests to be seated around a 
table, and she is trying to find a seatmg arrangement which will be most congenial 
for everyone How many seating arrangements must she consider, if (a) any person 
may occupy any seat? (b) the hostess seats herself at the head of the table? (c) 
guest A must not be seated next to guest B or guest C? (d) there are four men and 
four women, and men and women must be seated alternately? 
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3 A fraternity containing fifteen members wishes to elect a president, a vice 
president, a treasurer, a recording secretary, a corresponding secretary, and a 
sergeant-at-arms. In how many different ways is it possible to select such a slate 
of officers? 


10. PERMUTATIONS WITH SOME IDENTICAL OBJECTS 

If some of a set of n objects are identical with each other, the number of 
different permutations which can be formed is obviously smaller than it 
would be if the objects were all distinct For example, an experiment will 
readily show that we can form only twelve different four-letter permu- 
tations from the letters A, A , B } and C, although we can form twenty- 
four from the letters, A, B, C, and D To arrive at this result without 
listing and counting the twelve permutations, we can instead reason as 
follows If we temporarily identify the two identical obj’ects separately 
by calling them A l and A 2 , then there are 4* or 24 separate permutations,*'*'* 
according to equation 5-9-2 But this count includes BA X CA 2 and BA 2 
CA X (for example) as two different words, when they are in fact identical 
if we drop the subscripts We can correct for this duplication if we divide 
our preliminary result by 2 If there had been three identical letters, it 
would obviously have been necessary to divide by 3 !, or 6. If k x objects 
are identical, then Perm(n,n) is n ] /k x \, and if k 2 others are also identical, 
then we must divide this result by k 2 * In general, 

fi ! 

’Perm(n,n,k 1 ,k 2 j ) = (5-10-1) 

where fcj objects are identical, k 2 others are also identical, and so forth 
For example, let us compute the number of five-letter “wnrds” which can 
be formed from the letters A A ABB Here n is 5, k x is 2, and k 2 is 3 Equa- 
tion 5-10-1 then gives us Perm(5,5,2,3) — 5l/(2iX30 ? or 10 

The usefulness of this equation is illustrated by the following problem 
If each of two parents has one gene for blue eyes and one for brown eyes, 
then according to the laws of genetics, the probability that their child will 
have blue eyes is 1/4, and the probability that he will have brown eyes is 
3/4 If the parents have five children, what is the probability that three 
of them will have brown eyes and two will have blue eyes? 

To answer this question, we proceed as follows The probability that the 
first child will have brown eyes is 3/4, and the probability that the second 
child will have brown eyes is 3/4, therefore, by equation 5-6-1, the prob- 
ability that both will have brown eyes is 9/16 Continuing m this way, 
the probability that the first three children will have brown eyes and that 
the last two will have blue eyes is (3/4) 3 X (1/4) 2 . 

Obviously the result will be the same if we ask for the probability that 
there will be three brown-eyed children and two blue-eved children in 
some other order, for example, the probability that the first and fourth 
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children will have blue eyes and the others brown eyes is also (3/4 ) 3 X 
(1/4) 2 Since all these various possible sequences are mutually exclusive, 
we can obtain the probability that one or another of them will occur by 
adding their separate probabilities In other words, we must multiply 
(3/4) 3 X (1/4) 2 (which is the probability for a given sequence) by the 
number of possible sequences The number of possible sequences is 
obtained directly from 5-10-1, with n = 5, k x = 3, and k 2 = 2 The re- 
quired probability is therefore 


P = 


1> 2 
A) 


(f 


5 * 

3» X 2f 


0.26 


It is desirable to include a standard reference equation for problems of 
this type Stated m general teims, the problem is this If the probability 
that a given event will succeed is p, and the probability that it will fail is 
gy then, in n such events, what is the probability that there will be exactly 
5 successes? To answer this, we need only to generalize our results The 
probability that there will be exactly s successes and n — s failures in a 

— 71 I 

given order is p‘q n *, and the number of possible orders is 
The total probability is therefore 


(5-10-2) 

For example, let us solve the following problem* If six dice are thrown, 
what is the probability that there will be exactly two 5’s? To apply 
5-10-2, we will call a 5 a success, so that p is 1/6 and q is 5/6 Upon in- 
serting these values in 5-10-2, with n = 6 and $ = 2, we find that the 

/ 6’ Vi\Y5\ 4 

probability of exactly two 5 ; s is 1 24V \6/ \6/ * ° r a PP rox ^ mate ^y 0 20. 


PROBLEMS 

1 Using equation 5-10-2, compute the probability that the parents in the 
illustrative problem will have (a) three blue-eyed children and two brown-eyed 
children, (b) four brown-eyed and one blue-eyed, (c) five blue-eyed children * 

2. If five dice are thrown, what is the probability that there will be three l’s 
and two 6 5 s? 

3. How many six-letter “words” can be formed from the letters AAABBC ? 

4 If six coins are tossed, what is the probability that there will be (a) four heads 
and two tails? (b) Five heads and one tail? (c) Six heads? 


11. COMPOSITE PROBABILITY PROBLEMS 


Many problems can be solved most effectively by means of a combi- 
nation of the equations demonstrated in this chapter It is the purpose 

*In part c the quantity 0 T must be evaluated Smce n' — Y * we can see that 

n -f 1 

0! must equal l f /l or 1 for consistency of meanmg. 
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of this article to bring together these methods by means of a set of repre- 
sentative problems m probability. 

(1) If six dice are thrown, what is the probability that there will be at 
least two 6 ; s? Answer Equation 5-10-2 can be used to discover the 
probability that there will be exactly two 6’s, but there is no simple equa- 
tion which will tell us the probability that there will be at least two 6’s 
We must therefore assemble the answei as follows. The probabihty of 
exactly two 6 ; s is 9375/46,656, of exactly three, 2500/46,656; of exactly 
four 375/46,656, of exactly five, 30/46,656, and of exactly six, 1/46,656. 
Since these are mutually exclusive, we can add them to obtain the prob- 
ability that one or another of these events will take place, m other woids, 
that there will be at least two 6’s The required probability is therefore 
12,281/46,656 or 0 263 

We could have shortened the labor of this computation by computing 
first the probability that there will not be at least two 6’s. The preb-^ 
ability that there will be only one 6 is 0 402, and the probability that there 
wall be no 6’s is 0.335, so that the probabihty that our event will not take 
place is 0 737 From 5-2-3, we see that the probabihty that it will take 
place is 1 minus 0 737, or 0 263, as before 

(2) A man plans to take an examination to qualify for a promotion 
The examination consists of five problems and m order to qualify he must 
solve the first three problems, and he must also solve either the fourth or 
fifth, or both. In preparmg for the examination, he has tested himself on 
comparable problems, and he believes that the probability that he can 
solve the first problem is 0 8, and for the others 0 6, 0 9, 0.75, and 0 3, m 
that order What is the probability that he will qualify? Answer: Let us 
begin by computing the probabihty that he will solve either the fourth or 
the fifth, or both We can obtain this by adding the probabilities that he 
will solve the fourth but not the fifth (0 75 X 0.7), the fifth but not the 
fourth (0 25 X 0 3), and both the fifth and the fourth (0 75 X 0 3). This 
gives us a total probability of 0 825. (Alternatively, we could compute 
the probabihty that he would fail on both (0 25 X 0 7) and then subtract 
this from one to find the probability that he would succeed on at least 
one.) The problem is now resolved into finding the probability that all 
four events will take place, and the four probabilities aie 0 8, 0 6, 0 9, 
and 0 825 Following equation 5-5-1, we multiply these together and find 
that the required probability is 0 356 

(3) Suppose that you were offered the following wager: Tw t o cards are 
to be drawn from a deck at random, if there is at least one spade, you win 
50 j£ from your opponent; if there are no spades, he wins 50^ from you Is 
this a fair wager? Answer At first glance this game appears to be evenly 
matched, since the probability of drawing a spade is 1/4 and you have two 
chances at it. This is however incorrect The probability that two cards 
drawn at random will both be rcon-spades is (39/52) X (38/51), or 0.56, 



88 


INTRODUCTION TO THE THEORY OF STATISTICS 


[CH 5 


so that your opponent should expect to win in about fifty-six trials out 
of a hundred He would therefore win twelve times oftener than you, and 
could expect to win about six dollars in one hundred plays 

(4) A widely played gambling game consists of the following wager. 
The player bets any sum of money on any number from 1 to 6; let us say 
that he bets one dollar on 5 Three dice are then thrown If there is one 
5, he wins a dollar, if there are two 5’s, he wms two dollars, and if there are 
three 5’s, he wms three dollars Is this a fair wager 1 ? If not, how much 
should the player expect to w T m or lose on a hundred plays? Answer* The 
probabilities that there will be no 5’s, one 5, two 5’s, and three 5’s, re- 
spectively, are 125/216, 75/216, 15/216, and 1/216 In 216 plays, there- 
fore, he can expect to lose one dollar 125 times, win one dollar 75 times, 
win two dollars 15 times, and win three dollars once He should therefore 
expect to lose seventeen dollars per 216 plays, and the operator of the 

^~game should therefore expect to show a profit of about 7 9 per cent of the 
money wagered 

(5) A doctor finds that a patient has three characteristic symptoms 
which are always present m a given disease None of the symptoms is, 
however, an absolute basis for a diagnosis, because not all people having 
the symptoms have the disease In particular, 4 per cent of the people 
who do not have the disease nevertheless have the first symptom, 12 
per cent have the second symptom, and 5 per cent have the third symptom 
If 1J per cent of all men m the patient’s age group have the disease and 
if the symptoms are mdependent of each other for patients not having the 
disease, what is the probability that the patient m question has the disease? 

Answer If 77 is the total number of men m the patient’s age group, 
then the number of these who do not have the disease is (1-0 0125) 77 or 
0 987 577 For any one of these, the probability that he will have all three 
symptoms is 0 04 times 0 12 times 0 05 or 0 00024 The number of men 
who do not have the disease but have all three symptoms is therefore 
0 00024 times 0 987577, or 0 00023777 The number of people who have 
all three symptoms and do have the disease is 0 01257V The total number 
of people with all three symptoms is therefore 0 00023777 plus 0 012577 or 
0 0127477, and since 0 0125 77 of them have the disease, the probability 
that the patient has the disease is 0 012577/0 0127477 or 0 98 Thus the 
combination of symptoms forms a very powerful diagnostic tool, even 
though the occurrence of the symptoms separately means very little 

PROBLEMS 

1. If four dice are thrown, what is the probability that there will be at least two 
5’s? (W ork this m two different ways ) 

2. In the second illustrative problem m this article, what is the probability that 
the man will solve the first and second problems correctly, plus at least one of 
the last three? 

3 Answer Problem 9, Chapter 1, Article 4, with the following additional in- 
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formation About 2 per cent of the men m the patient’s age group have the disease 
in question 

4 A boy and his father, aged 11 and 56, are to share equally m an inheritance 
of $40,000 at the end of ten years if both survive, and if only one survives he will 
receive the entire inheritance If neither survives, the inheritance will be given 
to a specified college What are the mathematical expectations of the son, the 
father, and the college? 

5 A man is 50 and his wife is 41 What is the probability that both will be 
alive at the end of twenty years? That he will be living and she will be dead? 
That she will be living and he will be dead‘ ? That both wall be dead? 

6 In Problem 4, Article 7, what is the probability that the target will be struck 
if ten planes are assigned to the mission? 

7 A man has three different pairs of socks m a drawer If he enters the room m 
the dark and takes two socks at random, w’hat is the probability that they will be 
a pair? 

8 From a committee of eight men and three women, a subcommittee of four is 
to be chosen by lot What is the probability that it will consist of two men and 
two women? That the men on the subcommittee will outnumber the w T omen? 

9 Cards in a box are numbered from 1 to 100 inclusive If two are drawn at 
random, what is the probability that their sum will be an odd number? An even 
number? 


12. SUMMARY 

The 'probability of an event A is defined (equation 5-2-1) as the number 
of ways m which the event can succeed (s) divided by the total number of 
ways m which the event can succeed or fail ($ + /): 

PW) - S -T7 

where all the s + / ways in w r hich the event can occur must be equally 
likely The probability scale runs fiom zero (for an impossible event) to 
one (for a certain event) 

The empirical probability of an event is the fraction of cases in which 
the event has occurred m the past Examples of its use are given m 
Article 4 

The mathematical expectation of any venture is the probability that the 
venture will succeed, times the gam which will result if it does succeed. It 
is useful for evaluating the present cash value of an incomplete venture, 
the outcome of which is m doubt Examples are given in Article 5. 

The probabilities of composite events can be computed by means of the 
following laws 

I The probability that both of two independent events wall take place 
is the product of their separate probabilities (Tw t o events are independent 
if the outcome of the first does not affect the probability that the second 
wall occur ) 

II The probability that both of two dependent events will occur is the 
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probability of the first multiplied by the probability which the second will 
have if the first has already occurred 

III The probability that one or the other of two mutually exclusive 
events will take place is the sum of their probabilities (Two events are 
mutually exclusive if it is impossible for both to take place ) 

These three laws are proved and illustrated in Articles 6, 7, and 8. 
In computing probabilities, and for other purposes in statistics, the 
concept of a permutation is very useful A permutation is a sequence of a 
set of objects, thus A CB is one permutation of the first three letters in the 
alphabet, and CAB is another. The number of permutations which can 
be made from n objects, when only r of the objects are used in any given 
permutation, is denoted by Perm (ft, r) and is read “the number of permu- 
tations of ft objects taken r at a time ” It is given by equation 5-9-2, 

— ' Perm(n,r) = 

where n f is defined by equation 5-9-1 : 

n \ = l x 2 X 3 X 4 ••• ft 

If ft is zero, then the special definition 0 f = 1 applies instead 
If all of the ft objects are used m each permutation, then we must use 
equation 5-9-3 

Perm(ft,ft) = ft T 

The above equations require that the n objects are all distinguishable 
from each other. If instead, k x of the objects are identical with each other, 
and k 2 others are identical with each other, and so forth, then w T e must use 
equation 5-10-1 . 

Perm(ft,ft,&i,& 2 , ) = ~~ — 7 

Examples of the uses of these equations are given m Articles 9 and 10. 

If the probability that an event will succeed is p and the probability 
that it will fail is q, then the probability that it will succeed exactly s times 
in ft trials is given by equation 5-10-2: 


P(s) = 


ft! 8 n 

S*(ft — s) ? ^ ^ 


The use of this equation for probability problems is shown in Article 10. 
Its further use in the development of statistical theory will be explained in 
the following chapter. 

Practical problems in probability are likely to require some ingenuity 
m fitting the theory to the problem, and many problems require a combi- 
nation of the principles here developed Some examples of such composite 
probability problems are discussed m Article 11 



CHAPTER 

. 6 . 

NORMAL CURVE 


1. INTRODUCTION 

In Chapter 1 it was pointed out that many of the distributions occurring 
in practical investigations show a remarkable similarity to each other '1EP* 
particular, it w^as pointed out m Article 4 that a histogram of the distances 
which a group of high school girls can throw a baseball is very similar in 
general appearance to that of the percentage of dry matter in mangel 
roots, or to the egg sizes of a certam marine snail m Greenland, or to the 
scores of a group of freshmen on an intelligence test In this chapter we 
will attempt to isolate and describe the factors which account for the re- 
markable similarities m these apparently unrelated distributions and to 
show what conclusions can be drawn from the assumption that these 
factors are operative in any given distribution 

2. HISTOGRAM AS PROBABILITY GRAPH 

The first obstacle which arises in a comparison of distributions from 
different sources is the fact that irrelevant differences may be present 
which have been introduced by the investigators and which are not in- 
trinsic to the distributions Our first task is to express the data in such 
a form that these irrelevant differences disappear The simplest such 
difference arises from the fact that different investigators may have chosen 
different sample sizes and different class intervals for the same investi- 
gation. 

The effect of different sample sizes is easily removed if we shift our 
attention away from the frequency per class and direct it instead to the 
'probability per class, which v r e can obtam by dividing the frequency for 
each class by N. From the data in Table 6-2-1, for example, w r e see that 
eight patients out of forty had temperatures between 99 15 and 99 45, 
and therefore the probability that a patient chosen at random from the 
same universe will have a temperature between these limits is 8/40 or 
0.200. The probabilities, computed in this way, are listed in the fourth 
column of Table 6-2-1. Obviously these probabilities will be the same, 
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or nearly the same, for any two investigators working on the same problem, 
regardless of the sample sizes which they may choose 

The effect of choice of class interval is easily removed if we shift our 
attention from the probability per class to the probability per x unit If 


Table 6-2-1 Probability per x Unit 


Limits 

X 

(CM) 

/ 

Prob per 
class 

(f/N) 

Prob per 
degree 
(f/NC) 

98 0- 98 2 

98 1 

2 

0 050 

0 167 

98 3- 98 5 

98 4 

5 

0 125 

0 417 

98 6- 98 8 

98 7 

10 

0 250 

0 833 

98 9- 99 1 

99 0 

10 

0 250 

0 833 

99 2- 99 4 

99 3 

8 

0 200 

0 667 

99 5- 99 7 

99 6 

4 

0 100 

0 333 

99 8-100 0 

99 9 

1 

0 025 

0 083 



40 




the probability is 0 200 that a patient chosen at random will have a tem- 
perature between 99 15 and 99 45, then the probability per degree is 
0 200/0.3 or 0 667 In general, we can obtain the probability per x unit 
by dividing f/N by C 

l\x) = j/NC (6-2-1) 

The probabilities per x unit are shown in the last column of Table 6-2-1 
and are represented graphically m Figure 6-2-1 Obviously these prob- 
abilities are independent of the choice of class interval as well as sample 
size A smooth curve has been drawn thiough these points to indicate 
that it is likely that the probability per degree changes steadily rather 
than abruptly as we go from one temperature to another The use of 
such a probability graph is demonstiated by the following examples 
(1) On the basis of the data m Figure 6-2-1, what is the piobability that 
a patient chosen at random from the same group will have a temperature 
between 98 3 and 98 4? Answer Since the inteival heie is only 01 
degree wide, the probability does not change much within the interval, 
and we can read the probability per degree from the midpoint of the 
interval, at 98 35 We find this probability per degree to be about 0 37, as 
shown by the left-hand arrow m the figure Since the interval is 0 1 
degree wide, and the probability per degree is 0 37, the total probability 
for the interval is 0 1 times 0 37 or 0 037 Therefore, about 37 patients 
per thousand should have temperatures within this interval 
It should be noted that in multiplying the base times the average height 
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of this vertical strip we have obtained its area In general, the probability 
that a variate chosen at random will lie between two limits is simply the area 
under the probability curve between these two limits . 

(2) What is the probability that a patient chosen at random will have 
a temperature between 98 77 and 99 42? Answer The interval is now so 



wide that the probability changes considerably within the interval, and 
the method used for the first problem is not accurate We must instead 
deduce the probability from a direct measurement of the area under the 
curve between these two limits This can be done conveniently and 
accurately by means of a simple instrument called a plammeter, but if 
such an instrument is not available, the area can be measured with suf- 
ficient accuracy by either of the following procedures 

(a) If the curve is on graph paper, count the squares under the curve 
and between the two limits, estimating the areas of the incomplete squares. 
The probability equivalent of each square is obtained by multiplying the 
base of any square by its height, m the units given by the graph For 
example, m Figure 6-2-1, the base of each square is 0 1 degree and the 
height of each square is 0 1 probability unit per degree One square 
therefore represents a probability of 0 01, as indicated on the graph 

(b) A slightly less accurate but more rapid procedure is the following. 
Draw vertical lines at each of the limits, and then lay a transparent ruler 
down along the top of the area m such a way that it makes the best possible 
compromise with the curved top boundary, as shown m Figure 6-2-1 
Compute the area under the resulting trapezoid as follows The height of 
the trapezoid is 0 93 at the left boundary and 0 60 at the right boundary, 
as shown by the arrows The average height is therefore 0 765 The 
width of the base is 99 42 minus 98 77 or 0 65 The area is therefore 
0 765 times 0 65 or 0 497 The required probability of an occurrence 
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between these limits is therefore 0 497, in other words, 497 patients out of 
a thousand should have temperatures between these limits 

(3) The following tabulation* shows the number of accidents reported 
on Ohio highways in 1950, subdivided into groups accordmg to the age of 
the driver: 


Age 

Number 

Under 16 

483 

17 

663 

18 

1253 

19 

1364 

20 

1318 

21 to 24 

6617 

25 to 44 

19669 

45 to 64 

7879 

Over 64 

1485 


What is the probability that the driver m any given accident is (a) 19 
years old? (b) That he is 40 years old? (c) That he is between 17 and 22, 
inclusive? (d) What age has the highest probability? Answer The 
difficulty m dealing with this data arises from the fact that the class width 
differs from one class to the next, so that the frequencies cannot be com- 
pared directly with each other This difficulty arises frequently in practice, 
since statistical data is often tabulated m this way. The difficulty is 
immediately overcome by converting the data into probability per x unit 
The details are left to the student as an exercise 

PROBLEMS 

1 Convert the data in Table 2-2-2 to probability per pound Plot the data 
and draw a smooth curve, and read from it the following the probability that a 
given wire will have a breaking strength (a) between 205 3 and 206 1 pounds, 
(b) above 205 8 pounds. 

2 Reread illustrative problem (3) above Convert the data to probability per 
year, plot, and answer the questions asked m the problem 

3 Convert the data m Table 2-7-1 to probability per thousand dollars of income, 
plot, and draw a smooth curve What income has the highest probability? 

3. THE PROBABILITY GRAPH IN t UNITS 

We have seen that it is possible to compare any two distributions of 
similar variates, m spite of differences in class interval and sample size, 
by expressing them in terms of probability per x unit, which we have called 
P{x). If the two distributions are concerned with variates which are 
measured in totally different units, then we must reconcile these units in 
order to compare the distributions with each other For example, if we 

*Reprmted from “Summary of Motor Vehicle Traffic Accidents in Ohio/' State of 
Ohio, 1950, by permission 
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wish to investigate the properties which a distribution of intelligence 
quotients has in common with a distribution of men’s heights, then it is 
obviously foolish to try to compare the probability per IQ unit to the 
probability per inch of height We need a new common unit in which 
both IQ and height can be measured, so that the resulting probabihties 
per unit will be comparable 

For this purpose we define a new unit of measurement called a t unit. 
The size of the new unit is simply the standard deviation of the distribu- 
tion, and the starting point is the arithmetic mean : 

t = (x - x)/a (6-3-1) 

If, for example, the arithmetic mean of a distribution of IQ’s is 95, and 
the standaid deviation is 20, then an x score of 135 becomes a t score of 
+2, and an x score of 65 becomes a t score of —1 5 The fourth colu mn of 
Table 6-3-1 shows a set of x’a reduced to t units. 

If we change the horizontal scale of the probability graph from x units 
to t units, then we must also change the vertical scale from probability per 
x unit to probability per t unit This is readily done by multiplying the 
probability per x unit by the number of x units in a t unit, that is, by the 
standard deviation We will use the symbol P(t ) for the probability per 
t unit: 

P(t) = P(x) X <r - fcr/NC (6-3-2) 


Table 6-3-1 Reduction to P(t)* 


Boundaries 
m Feet 

/ 

x — X 

x — X 

<j 

Ja/NC 

Pit) 


15- 25 

1 

-60 63 

-2 89 

0 007 

N = 303 

25- 35 

2 

-50 63 

-2 42 

0 014 

C= 10 

35- 45 

7 

-40 63 

-1 94 

0 048 

x = 80 63 

45- 55 

25 

-30 63 

-1 46 

0 173 

<r = 20 95 

55- 65 

33 

-20 63 

-0 99 

0 228 

tr/NC = 0 00691 

65- 75 

53 

-10 63 

-0 51 

0 366 


75- 85 

64 

- 0 63 

-0 03 

0 442 


85- 95 

44 

+ 9 37 

+0 45 

0 304 


95-105 

31 

+ 19 37 

+0 92 

0 214 


105-115 

27 

+29 37 

+ 1 40 

0 187 


115-125 

11 

+39 37 

+ 1 88 

0 076 


125-135 

4 

+49 37 

+2 36 

0 028 


135-145 

1 

+59 37 

+2 83 

0 007 



*Keprmted by permission of Prentice-Hall, Inc , from Applied General Statistics by 
Croxton and Cowden Copyright 1939 by Prentice-Hall, Inc, 
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If we now plot P(t) against t, it is obvious that we will remove any effect 
of the nature of the units m which the variates weie ongmally measured 
and leave for comparison only the basic pattern of the distribution 
To illustrate the procedure of reducing a frequency tabulation to a 
graph showing the frequency per t unit plotted against t units, let us use 
the data in Figure 1-4-1, showing the distances which 303 freshmen high 
school girls m Gary, Indiana, could thiow a baseball The proceduie 
(illustrated in Table 6-3-1) is as follows First we subtract x from each 
class mark (column 3) and divide each of the lesultmg numbers by a 
(column 4) The resultmg numbers are, by definition, the distances ex- 
pressed in t units To obtain P(t) we first compute a/ NC (which is 0 00692) 
and then multiply it by each value of / (column 5) The values of P(t) 
are plotted against t (filled circles m Figure 6-3-1), and a smooth curve 
has been drawn by eye to fit the points as well as possible If we apply 
the same procedure to the data concerning the percentage of dry matter 
mangel roots, (Figure 1-4-2), we obtain the set of points which are 

mil 



shown by open circles m Figure 6-3-1 The differences between the two 
distnbutions are no greater than we might expect from the random varia- 
tions between one sample and another 

PROBLEMS 

1 Convert the data m Table 4-3-2 to the Pit) form and plot How does the 
value of P{t) for t = — 1 compare with the corresponding value m Figure 6-3-1? 

2 Write a summary of all the advantages of expressing data m the form of 
probability per t unit instead of frequency per class 

4. BINOMIAL DISTRIBUTION 

In the preceding article we saw that many distributions, when expressed 
in terms of probability per t unit, reach a maximum of about 0 4 at the 
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point where t is zero, fall to about 0 25 when t is dbl, and drop to less than 
0 1 when t reaches ±2 Beyond this the curves dimin ish more gradually, 
generally reaching zero by the time t is ±2| or ±3. The objective of this 
article is to investigate the possibilities of explaining this characteristic 
shape on theoretical grounds There are two basic questions to be 
answered 

(1) What hypotheses must we make about the nature of the causes 
governing the sizes of the variates m order to show theoretically that the 
probability curve should have the properties described above? 

(2) Having accepted these postulates on the grounds of their success in 
predictmg the shape of a large number of observed probability curves, 
what additional information can we extract from them? 

Let us begm with the following preliminary assumptions about the 
factors which determine the size of any variate Later we will investigate 
the possible modifications of these assumptions: 

(1) Each variate consists of a fixed ingredient (# 0 ) which is the same 
for all variates, plus a variable ingredient The size of the variable in- 
gredient is not determined by a single cause, but by a very large number of 
small causes, each of which makes only a small contribution to the total. 

(2) The small causes all act independently of each other 

(3) All the contributions made by these causes are equal m size, but 
may be positive or negative 

(4) The probability that a given contribution will be positive is 1/2, 
and the probability that it will be negative is 1/2 

In practice, it is not necessary for the statistician to know the nature 
of the small contributing causes, but it will peihaps be instructive to try 
to identify some of them m a specific example The following is an account 
of the work of an astronomer m measuring the output of light from each 
of a set of stars He begins by photographing the unknown stars, and 
then, on the same plate, he makes a second exposure, this time using a 
field of stars whose brightnesses are already known After developing the 
plate, he sends a beam of light through the image of each unknown star 
and then into a photoelectric cell, wheie its mtensity is measured exactly. 
He makes a similar measuiement upon each of the stars of known bright- 
ness, and from the resulting data he constructs a graph showing the re- 
lationship between the photocell reading and the brightness of the star 
which produced the image Then, using the photocell reading of each un- 
known star, he reads off the brightness of the unknown star. Each unknown 
star is measured twice, and the difference between the two measures is com- 
puted for each A frequency tabulation formed from these differences 
displays the characteristic bell shape which we are trying to investigate.* 
In this case some of the contnbutmg causes have been identified and listed 
by the investigator 

*See for example the Astronomical Journal, Volume 51, No. 6, page 170 
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(1) The tianspaieney of the sky in the region of the comparison stars 
probably differed a little from that of the unknown star. 

(2) The companson exposure may have been a little longei or shorter 
than the exposuie on the unknown star 

(3) The photographic plate is not exactly uniform m sensitivity, and 
the star image may have fallen on a spot of slightly higher or lower than 
average sensitivity 

(4) The developer may not have cn culated exactly evenly over the 
plate 

(5) The tempeiature of the photogiaphic plate might have changed 
slightly between the star exposure and the comparison exposure, thus 
changing its sensitivity slightly 

(6) The moisture content of the plate may have changed between the 
two exposures, thus altering its sensitivity 

(7) The sensitivity of the photocell is not uniform over its surface, and 
the star image may have fallen a little to the left or right of its usual 
position, thus stiikmg a region of slightly different sensitivity 

(8) The voltage applied to the photocell may have varied slightly 
between the unknown star reading and the comparison star readings 

(9) Small errors are introduced m loundmg off the photocell readings. 

These contributing causes at first glance fail to fulfil the postulates on 

two counts first, they do not constitute a “large number” of causes, and 
second, they do not all make contributions of equal size In the above 
investigation, for example, it was shown that the differences of trans- 
parency of the sky far outweighed any other cause It is possible, how- 
ever, to surmount both of these difficulties by assuming that the larfe 
contributions made by these identifiable causes are themselves made up 
of numerous smaller contributions, each too small to be separately identi- 
fiable 

To approach this problem on a mathematical basis, let us introduce the 
following notation* 

x Q = the fixed ingredient which is present in all the variates. 
e — the size of each of the small contributions made by the 
various causes We will call each of these contributions 
an element 

n = the number of elements present m each variate. 

$ = the number of these elements which are positive 
n — s = number of elements remaining which are negative. 

P(s) = the probability that there will be exactly 5 positive elements 
m a given variate 
N = the number of variates 

If you are not accustomed to operating with so large a number of sym- 
bols, you will probably find the mathematical treatment difficult to follow 
If this is the case, you might find it useful to construct an imaginary situa- 
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tion in which the various quantities are illustrated. The following is 
offered as an example A merchant has been offering for sale an article 
for 750, but, upon finding that his customers prefer to gamble on the 
transaction, he offers instead to toss eight coins and to deter min e the 
selling price as follows For each head which turns up, the price is in- 
creased 50 above the original price, and foi each tail it is decreased 50 
Let us suppose that 256 customers accept these terms and that the mer- 
chant then studies the distribution of the actual charges for these customers 
In this case, e is 50, n is 8, x 0 is 750, N is 256, x is the actual charge to any 
customer, $ is the number of heads in any given throw, and P(s) is the 
probability that there will be this many heads For example, P( 3) is the 
probability that any given customer will throw three heads, and it is 
therefore also the probability that any given customer will pay 650 

To investigate the probability of occurrence of an x of a given size, we 
must find how many positive contributions are necessary to produce && ^ 
x of this size Each variate will consist of x Q , plus 5 positive contribu- 
tions of size e, plus n — s negative contributions of size e 

x = x 0 +sXe J r(n — s)(— e) 

or x — x 0 + 2es — en (6-4-1) 

If we solve this for $, we have 

s = (x — x 0 + ne)/2e (6-4-2) 

For example, for the charge to be 950, it is necessary that there must be 
(95 —75 + 8 X 5)/(2 X 5), or 6 heads 
The probability of occurrence of an # of a given size can now be found 
by computing the probability that there will be exactly s heads, where s 
is related to x by 6-4-2 We can use equation 5-10-2 to obtain this prob- 
ability by inserting | for p and | for q 

(!)'(!)’"’ <«^» 

Table 6-4-1 Computation of P(s) 

s P(s) x NP(s) 

0 1/256 10 35 1 

1 8/256 0 45 8 

2 28/256 0 55 28 

3 56/256 0 65 56 

4 70/256 0 75 70 

5 56/256 0 85 56 

6 28/256 0 95 28 

7 8/256 1 05 8 

8 1/256 1 15 


1 



100 


INTRODUCTION TO THE THEORY OF STATISTICS 


[CH 6 


The use of this equation is illustrated m Table 6-4-1, where the probability 
of occurrence is tabulated for each value of s in the illustrative example. 
In the third column are shown the values of x which result from the various 
values of s, and m the fourth column are given the theoretical number of 
occurrences of each value of s when the number of variates is 256 Equa- 
tion 6-4-3 gives us the probability of occurrence of one specific value of x 
If we can rewrite it m the form of a probability per x unit, and then con- 
vert the result to probability per t unit, we will have a theoretical dis- 
tribution which we can compare directly with any observed distribution 
which is expressed m the same way To avoid confusion we will make 
these transformations separately 

A Conversion to x Units Equation 6-4-3 gives us the probability 
of occurrence of a single value of s, that is, the probability per s unit 
To obtain the probability per x unit we must multiply by the number of 
£ r umts in one x unit. An inspection of equation 6-4-2 shows that if we 
increase x by one unit, 5 will increase by l/2e units, there are therefore 
l/2e units of s in one x unit The probability per x unit is therefore 

p( x ) - -1 ~ 

w 2 e$Kn - «)» 

To complete the transformation to x units we must now replace s by the 
coi responding value of x, from equation 6-4-2 * 

P(x) = ~ J — -r — — r- (|) (6-4-4) 

l 2e A” S 



where we must choose the values of x m such a way that the terms m the 
denominator are positive integers This gives us the piobability per x 
unit, expressed m x units For example, if we inseit 95^ for x, we obtain 
7/640 for P(x), which is the probability per penny m the neighborhood 
of 95^ 

B Conversion to t Units We now wish to express our results m 
terms of t units, which are defined by equation 6-3-1 For our purpose 
it is convenient to solve this for x 


x = x + <r x t (6-4-5) 

We see that if we change t by one unit, x will change by <j x units, and 
there are therefore <r x units of x m one t unit To change equation 6-4-4 
to probability per t unit we must therefore multiply by <r x 

Pit) - a x P(x) 

To change our variable from x to t we must substitute for x its value m 
terms of t, as given by 6-4-5 Upon making this substitution, we have 


*lf you find that this treatment takes you beyond your depth mathematically, it is 
suggested that you omit the remainder of Article 4 and go directly to Article 5 
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P(t) 

= **(&) 7 £ 


n 1 


x + <r x t — x 0 + ne 
2e 




x + <j £ t — x o + ne \ 
2e I 


(6-4-6) 


C Evaluation of x and a x in Terms of § and a s We cannot yet 
use equation 6-4-6 for a direct comparison with observed distributions 
because of the presence of the two unknown quantities x and <j x To 
evaluate these, let us begm by taking the mean of both sides of equation 

6-4-1: 

x — x Q + 2es — en = x Q + 2es — en (6-4-7) 

The value of <r x is, by definition, — x) 2 From equations 6-4-1 and 
6-4-7 we see that x — x — 2es — 2 es, and <r x becomes 

a x — V (2es ~ 2 es) 2 = 2 eV(s — s) 2 = 2 ea 3 


Substituting these values for x and <r x in 6-4-6, we have 


P(t) = 


n 1 

a * (s + (T a t)\n — s 



(6-4-8) 


D. Evaluation of s and a a . Our final objective has not yet been 
accomplished; we have merely shifted the problem from that of finding 
x and (i x to that of finding s and <r a To find s we must multiply each 
value of s by the number of times which it occurs, add the results, and 
divide by the total number of occurrences Since we are interested m 
theoretical rather than observational results, we must use the expected 
number of occurrences, which is simply the number of variates times the 
probability that any variate will have the given value of s, it is m other 
words N X P(s) We can either proceed directly with these expected 
occurrences, as shown m the last column of Table 6-4-1, or we can shorten 
the work a little by cancelling the N as follows: 

s = 2sNP(s)/N = SsP(s) 


By the same argument, we can find cr a as follows: 

,, = v^ir = Vs( S - wnfWn = V(s - wm 

The computation of s and <r s for n = 8 is shown in Table 6-4-2, from which 
we see that s is 4 and that cr a is \/8/2 If we repeat the experiment for 
other values of n (the computations are left to the student as an exercise), 
we obtain the following 


n 

i 

cr> 

8 

4 0 

V8/2 

7 

3 5 

V7/2 

6 

3 0 

V6/2 

5 

2 5 

V5/2 
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From this result we generalize as follows.* 

s = n/2 (6-4-9) 

<7, = Vn/2 (6-4-10) 

E The Final Equation If we now substitute n/2 for s and Vn/2 for 
(X, m equation 6-4-8 we will obtain 


P(t) = 


n^'\/n 

n - Vnf j ^ n+l 


(6-4-11) 


This equation now expresses our theoretical distribution in exactly the 
same form as the actual distributions shown m Figure 6-4-1, and it is 
possible to compare them directly and to see how closely the actual dis- 
tributions come to the theoretical one for any assumed value of n Follow- 


Table 6-4-2 Computation of s and a H for n = 8 


s 

P(s) 

sP(s) 

$ — s 

(« ~ ^P(s) 

0 

1/256 

0/256 

-4 

16/256 

1 

S/256 

S/256 

-3 

72/256 

2 

28/256 

56/256 

— 2 

112/256 

3 

56/256 

168/256 

-1 

56/256 

4 

70/256 

280/256 

0 

0 

5 

56/256 

280/256 

1 

56/256 

6 

28/256 

168/256 

2 

112/256 

7 

8/256 

56/256 

3 

72/256 

8 

1/256 

8/256 

4 

16/256 



1024/256 


512/256 


$ = 

1024/256 = 

4 0 



tr, = 

V 512/256 

ii 

< 
oo 1 

to 



ing the assumptions listed at the beginning of this article, we are pri- 
marily interested m the case m which the number of elements is very 
large Accordingly, let us begin by studying equation 6-4-11 with small 
values of n, and then see what changes will occur when n becomes pro- 
gressively larger If we begin with n = 9, equation 6-4-11 predicts the 
following values of P(t) 

*A proof of these equations can be found in Chapter 1 of the second volume of 
Kenney’s Mathematics of Statistics, D Van Nostrand Company, 1939 
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t P(t) 

0.333 0 369 

1.000 0.246 

1.667 0 105 

2.133 0 026 

3.000 0 0029 

The resulting probability curve is shown by the solid circles and the curve 
m Figure 6-4-1 We see that the curve reaches a maximum of about 0.39, 
that it changes curvature between t = 1 and i = 2, that it begins to level 
off beyond t — 2, and that it has almost reached zero by i = 3. In short, 
it resembles very closely the curve shown in Figure 6-3-1 
If we go to n = 16, we find the following values of Pit) : 

t P(t) 

00 0 393 

05 0 349 

10 0 244 

15 0 133 

2 0 0 056 

2 5 0.017 

3 0 0 0037 

3 5 0 0005 

40 0 00003 


When plotted, these values give us a probability curve which differs very 
little from the n — 9 curve, as shown by the open circles m Figure 6-4-1 
The only perceptible differences m the n = 16 curve as compared with the 
n = 9 curve are these* first, the central value is a little highei , and second, 
the wmgs of the curve extend out a little farther. If we choose still larger 
values of n, the central value builds up a little farther, and the wmgs extend 
farther and farther to the left and to the right, but otherwise the general 
shape of the curve differs extremely little from that in Figure 6-4-1. 
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The agreement between the theoretical curve (Figure 6-4-1) and the 
observational curves (6-3-1 and others) is sufficiently good to indicate that 
the assumptions listed at the beginning of this aiticle may apply to many 
distributions occurring m various fields It would, however, be faulty 
logic to accept this as positive evidence of the con ectness of the postulates 
In particulai , if we can predict the observed distributions successfully by 
using only some of the assumptions, then it would be unnecessary and 
logically faulty to accept the remaining ones The problem of reducing 
to a minimum the number of assumptions which we must make m this 
derivation is an important one, and we will return to it m Article 6. 

PROBLEMS 

1 Using equation 6-4-11, compute for n — 4 the values of P(t) for t = 0 and 
for £ = 1 

^ 2. Write a complete derivation of equation 6-4-11, supplying all the steps which 
are omitted in the text 


5. NORMAL CURVE 

In the preceding article we showed that equation 6-4-1 represents the 
probability curve which is to be expected when the deviation of each 
vanate from the mean is composed of a large number of small equal con- 
tributions, independent of each other and each having a probability of 
1/2 of being positive, and we demonstrated the use of this equation for 
several values of n 

This procedure has two weaknesses fhst, the computation of P(t) 
becomes very laborious if n is large; second, the equation is applicable 
only for those values of t such that (n ± Vft t)/2 is a whole number 
The first difficulty is particularly inconvenient since it is precisely the 
large values of n m which we are interested; m fact, we would like to find 
what the distiibution is like if n becomes larger and laiger without limit 
Fortunately it is possible, by means of mathematical analysis involving 
calculus, to find what happens to P(t) as n becomes indefinitely large 
The result can be written in the following simple form: 

m - ««-» 

where e is an abbreviation for the number 2 718 • * • .* The curve obtained 
by plotting equation 6-5-1 is called the normal curve , and any distribution 
which approximates this curve is called a normal distribution To show 


*The number e has a standard abbreviation because it is widely used in advanced 
mathematics. Some readers may be familiar with it in connection with its use as the 
base of the so-called Napierian or natural system of logarithms 
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that it gives the limiting value of P(t) as n becomes very large, a few 
specific values of P(t) are shown m Table 6-5-1, computed for n = 4, 9, 16, 
25, and 36 In the last column are shown the corresponding values ob- 
tained from equation 6-5-1 An mspection of the table shows that for 
large values of n the difference between the two equations becomes negli- 
gible. We therefore adopt equation 6-5-1 as the equation for the theo- 

Table 6-5-1 P(t) for Increasing Values of n 


Equation 6-4-11 Equation 6-5-1 


t 

n = 4 

n = 9 

n — 16 

n= 25 

n = 36 


0 

0 3750 


0 3928 


0 3962 

0 3989 

1 

0 2500 

0 2461 

0 2444 

0 2435 

0 2431 

0 2420 

2 

0 0625 


0 0555 


0 0546 

0 0540 

3 


0 0029 

0 0037 

0 0040 

0 0041 

0 0044 


retical form to be expected for any distribution which fulfills the basic 
assumptions listed at the beginning of Article 4 

The computation of P(t) by means of equation 6-5-1 can be performed 
by means of the principles described in Chapter 3 For example, to 
evaluate P(t) for t = 3, we begin by evaluating the exponent, which is 
— 3 2 /2 or —4 5 To proceed from here, let us evaluate e 4 5 by means of 
equation 3-2-3, which tells us that log e 4 5 = 4 5 log e Log e is 0 4343, from 
which we see that log e 4 5 = 1 954 We look up the antilog of 1 954 and 
see that e 4 5 is equal to 90 0 From equation 3-3-6 we see that e~ 4 5 must 
be 1/e 4 5 or 1/90 0 The remainder of the comput ation can be completed 
most quickly by slide rule, giving us P( 3) = (1/ V2 X 3 1416) X (1/90) 
= 0 0044 

PROBLEMS 

1 Using equation 6-5-1, compute P{t) for t = 0, t = 1, t = 2, t = — 2 5, and 
/ = -4. 

6. APPLICABILITY OF NORMAL CURVE 

The assumptions described m the preceding article are sufficient for the 
derivation of the equation of the normal curve, and we can assert that any 
distribution which fulfils these assumptions will have an approximately 
normal form Actually, it is possible to derive the normal curve from still 
less restrictive assumptions and thus to broaden the scope of the cases in 
which we should expect normal distributions The mathematical treat- 
ment required for these derivations is complex, and we will limit ourselves 
to summarizing their conclusions 
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As a first step in generalizing our results, let us consider the effect of 
removing the fourth assumption listed m Article 4, and let us substitute 
the more general condition that the probability that an element is positive 
is some fixed number p, and the probability that it is negative is a fixed 
number q, where q — 1 — p The mathematical tieatment is similar to 
that in Article 4, and the conclusions are as follows 

(1) The mean of the lesultmg distribution is not , but is now x 0 + 

(: v - 

(2) The standard deviation of the resulting distribution, m terms of s, 
is no longer s/ n/ 2, but is 

C r 8 = Vnpq = V np(l — p) (6-6-1) 

or, in terms of x, c x = 2e V npq Equation 6-6-1 is displayed for future 
reference in later chapters 

^-(3) The limiting form of the distribution, as n becomes larger and 
larger, is again the normal curve, regardless of the values of p and q 

To generalize still further, it is possible to lemove the restnction that 
all the elements must have the same size Instead we can picture the 
elements as having various sizes (as long as they are all very small) and 
assume only that the probability of occurrence of an element of a given 
size is a fixed number Under these conditions it is still possible to derive 
the equation of the normal curve In short, the only essential conditions 
are the first and second ones listed m Article 4 

These two important conditions can be mastered most quickly by study- 
ing some distributions which lesult from situations m which the conditions 
are violated An example of such a distribution is shown m Figure 6-6-1, 
which shows the result of measuring the length of the glumes of 595 indi- 
vidual wheat plants The histogram shows three well-defined maxima, 
and no resemblance whatevei to the normal curve The explanation for 
this is revealed by an examination of the source of the data The wheat 
plants were a cross between Rivet wheat (with an average glume length 



Figure 6-6-1. Lengths of Wheat Glumes. (Based upon data from 
"The Combination of Observations" by David Brunt, by permission of the 
Cambridge University Press.) 
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of 9 millimeters) and Polish wheat (with an average glume length of 28 
millimeters) The sample shows marked maxima near these two points, 
and a third maximum intermediate between them It appears highly 
likely that the first of the two conditions has been violated, namely that 
all the causes of differences between the variates should be small Here 
there is obviously a very large cause at work, namely the differences in the 
inheritance factor, which enters on an all-or-nothing basis Furthermore, 
we cannot escape from this first condition by supposing that such an 
inheritance factor (say, for example, the mhentance of the Rivet genes) is 
itself made up of a large number of contributing factors, because if we 
adopt this view then we violate the second condition, namely, that all the 
contributing factors must be independent Obviously such hypothetical 
small contributing factors are not independent, since they are inherited 
all together or not at all 

Figure 6-6-2 shows the results of drawing sets of 15 cards at random 
fiom a deck and counting the num- 
ber of aces m each set, each set 
being returned to the deck and 
mixed before the next set is drawn 
The histogram shows a general re- 
semblance to the normal curve, but is 
conspicuously unsymmetrical Here 
the first condition is roughly fulfiled, 
but the second condition is violated, 
because the probabilities for the 
separate elements are not indepen- 
dent On the contrary, the proba- 
bility that any one of the fifteen cards will be an ace is strongly influ- 
enced by the number of aces already drawn m that set 

The final justification for the use of the normal curve in any practical 
problem must lie m its success m predicting a distribution similar to the 
observed one You should make an observational check, if possible, 
before adopting the hypothesis that a given distribution is normal, even 
though the causes operating upon the variates appear to fulfil the con- 
ditions for the normal curve 

PROBLEMS 

1. Would you expect a normal distribution, approximately, for the number of 
heads obtained by tossing twenty coins a large number of times? 

2 Would you expect a normal distribution for the heights of a large number of 
American students of both sexes? Of American men? Of mixed American and 
Japanese men? Explain 

3 If you counted the number of errors made each time by a rat m running a 
maze a large number of times, would you expect the resulting distribution to be 
normal? Explain m full 
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7. NORMAL CURVE TABLES 

The normal curve is used so widely in statistics that statisticians have 
found it to their advantage to precompute and tabulate a set of values of 
P(t) for all values of t which are likely to be needed m practical problems 
Such tables aie widely available, and it is therefoie not necessary m prac- 
tice to compute Pit) from equation 6-5-1 It is the purpose of this article 
to describe the standard noimal curve tables and to explain their uses 
The tables leproduced m the back of this book (Appendices IV and V) 
contain tables as follows 

Appendix IV These numbers, labeled “Ordinates of the Normal 
Curve,” are the values of P(i) corresponding to each tabulated value of t 
The whole number and the first decimal of i are found m the left-hand 
column, and the second decimal is found m the top row The value of 
P(t) is then found m the body of the table For example, to find the value 
of P(t) for t — 1 52, we find 1 5 in the left column and 0 02 m the column 
headings The number in the body of the table is then 0 12566 For most 
piactical pioblems these values of P(t) should be rounded off to four or 
thiee decimals 

Appendix V These numbers, labeled “Areas undei the Normal Curve,” 
are measures of the total area under the normal curve between its center 
and the tabulated value of t As we showed m Aiticle 2, these areas measure 
the probability that a variate chosen at random will lie between these 
limits For example, we see from the table that the area given foi t = 2 93 
is 0 49831 This is the piobability that a variate chosen at random will 
lie between t = 0 and t — 2 93 For convenience we will use the abbievia- 
tion A(t) foi these areas 

The normal curve tables can be used diiectly to find the probability 
that a variate chosen at random will fall between any given limits The 
details are shown m the following lllusti ations 

(1) What is the piobability of occurrence of a vanate between t = 1 0 
and t — 12? Answei Since the interval is small, we can assume that 
the mean value of P(t) over the interval is not peiceptibly different from 
its value at the midpoint Accordingly we read P(t) from the tables for 
t — 11 and find it to be 0 2178 This is the probability per t unit, and we 
must multiply it by the numbei of t units m the interval, which is 0 2 The 
required probability is 0 2178 times 0 2, or 0 04356 In other words, 
about forty-four variates out of a thousand should fall within this interval 
if the distribution is normal 

(2) What is the probability that a vanate chosen at random will lie 
between —1 and +2 m t units? Answer Since the interval is wide, 
P(t) will vary greatly from one side of the interval to the other, and we 
cannot use the method of the preceding problem Instead we must find 
the area under the curve between these limits We find that A(l) is 



ART. 8] 


NORMAL CURVE 


109 


0.3413, this is the probability of occurrence of a variate between t = 0 
and t = 1. Since the curve is symmetrical, it is also the probability of 
occurrence of a variate between —1 and zero. We see also that the A( 2) 
is 0 4772, this is the probability of occurrence of a variate between zero 
and 2 The relationship of these two areas is shown m Figure 6-7-1 By 
adding the two probabilities we find that the required probability is 0 8185 
In other words, about 82 per cent of the variates should lie between —1 
and +2 for any distribution which is normal 

(3) What is the probability that a variate chosen at random will he 
between +1 and +2 on the t scale? Here we find, as before, that A( 1) = 
0 3413 and A (2) = 0 4772, but now these are related as shown m Figure 
6-7-2, and it is necessary to subtract the smaller from the larger m order 
to obtain the area between 1 and 2. The required probability is therefore 
0 1359 



(4) What is the probability of occurrence of a variate more than three 
standard deviations away from the mean? Answer We find from the 
tables that A (3) is 0 4986, this is the probability that a given variate will 
lie between zero and 3 in i units The probability that a variate will lie 
between —3 and +3 is twice as large, or 0 9972 The probability that a 
given variate will not lie between these limits is obtained by subtracting 
this from 1 This gives us 0 0028 for the probability that a given variate 
will fall outside of these limits 

PROBLEMS 

1 Find the probability of occurrence of a variate between 2 4 and 3 0m t units, 
using the P(t) tables 

2 Repeat Problem 1, using the A(t) tables, and compare the results Is this a 
more or a less accurate procedure than that of Problem 1? 

3 Use the A(t) tables to find the probability of occurrence of a variate between 
2 and 3 m t units 


8. PROPERTIES OF NORMAL CURVE 

By using the tables as shown m the preceding article, the following 
properties of the normal curve can readily be demonstrated. 
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(1) Approximately 68 per cent of the variates lie within one standard 
deviation from the mean. 

(2) Approximately 95 per cent of the variates lie within two standard 
deviations from the mean 

(3) Approximately 99 7 per cent of the variates lie within three standard 
deviations from the mean, and only 0 3 per cent lie beyond this range 
These percentages are shown as aieas m Figute 6-8-1 

(4) Half of the variates should lie within 0 6745 i unit fiom the mean 

(5) At 1 t unit from the center, the height of the normal curve is 0 2420/ 
0 3989, or about 0 6 as high as it is at the center 



The fourth pioperty is useful m connection with a measure of dispersion 
called probable error, which will be discussed briefly here The probable 
error (PE) is that deviation from the mean such that the probability that 
a variate chosen at random will have a smaller deviation is one half It is, 
in other words, the half-width of the range containing halt of the variates. 
This meaning is demonstrated m Figure 6-8-2 From the normal curve 
tables we see that, if the distribution is normal, the probable error is 
related to the standard deviation by the simple relationship 

PE = 0 6745a- (6-8-1) 

The probable error is widely used as a measuie of the uncertainty of 
results in the exact sciences, and it is described here for the sake of those 
students who plan to work in these fields In all other fields the same 
results ate customarily described m teims of standard deviation It is 
perhaps to be regretted that two such measures are in widespread use, 
when one would have served as well 

PROBLEM 

1 Using the normal curve tables, verify the five properties of the normal curve 
listed m the above article. 
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9. FITTING NORMAL CURVE 

Before accepting the hypothesis that a given distribution is normal, it 
is desirable to make an exact comparison between the distribution and the 
normal curve This can be done m either of two ways first, we can reduce 
the observed data to probability per t unit and plot it agamst t, as we did 
m Figure 6-3-1, and then plot the normal curve on the same diagram, 
second, we can compute the frequencies which would be expected if the 
distribution were exactly normal and plot these predicted frequencies on 
the histogram of the data The second procedure will be adopted here. 
It can be learned most quickly by study of the illustrative example m 
Table 6-91, which contains the scores of 206 freshmen on the Thorndyke 
Intelligence Test * The specific steps are as follows 

(1) Compute x and cr for the distribution. This step has been omitted 

from Table 6-9-1 For this data, x is 81 59 and a is 12 14 ^ 

(2) List the boundaries (#&) of all classes (third column) These should 
be staggered, so that each boundary comes between two classes, as shown 

(3) Convert the x values of these boundaries into t values This is best 
accomplished m two steps* first, subtract x from each x b (fourth column); 
second, divide the results by a (fifth column) 

(4) Read from the normal curve tables the values of A{t) corresponding 
to each of these values of t (sixth column) 

(5) Compute the probability for each class (seventh column) This is 
done by subtracting values of A (t) m pairs For example, the probability 
of occurrence of a variate between t — 0 and t = —3 06 is 0 4989, as we 
see from the first entry in the sixth column, and the probability of occur- 
rence of a variate between t = 0 and t = — 2 64 is only 0 4959, as we see 
from the second entry m this column The probability of occurrence of a 
variate between —3 06 and —2 64 is obviously 0 4989 minus 0 4959, or 
0 0030, which we enter m the seventh column The only exception to this 
procedure occurs in the 80 to 84 class, m which i changes from a negative 
to a positive value For this class we see that the probability between 
minus 0 17 and zero is 0 0675, while that from zero to plus 0 24 is 0 0948. 
The total probability for the class (0 1623) is obtained by adding these 
two These probabilities m the seventh column should again be staggered 
as shown, smce each represents the probability between two boundaries. 

(6) In the last column are given the predicted frequencies (f p ) To 
compute these, we multiply the total number of variates (N) by the 
probability that any of these variates will fall in the given class For 
example, the predicted probability that a variate will fall m the first class 
is 0 0030, and we find the predicted frequency by multiplying this by 206, 
obtaining 0 6 


*For source of data, see Figure 1-4-3 
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Table 6-9-1 Fitting Normal Curve 


Limits 

/ 

Boundaries 

x b — X 

(x b — x)/a 

A(t) 

Prob 

N X Prob 



(%) 


(0 



(/„) 



44 5 

-37 09 

-3 06 

0 4989 



45- 49 

1 

49 5 

-32 09 

-2 64 

0 4959 

0 0030 

0 6 

50- 54 

2 

54 5 

-27 09 

-2 23 

0 4871 

0 0088 

1 8 

55- 59 

2 

59 5 

-22 09 

-1 82 

0 4656 

0 0215 

4 4 

60- 64 

10 

64 5 

-17 09 

-1 41 

0 4207 

0 0449 

9 2 

65- 69 

15 





0 0794 

16 4 



69 5 

-12 09 

-1 00 

0 3413 



70- 74 

27 

74 5 

- 7 09 

-0 58 

0 2190 

0 1223 

25 2 

75- 79 

37 

79 5 

- 2 09 

-0 17 

0 0675 

0 1515 

31 2 

3S 

A 

GO 

30 

84 5 

+ 2 91 

+0 24 

0 0948 

0 1623* 

33 4 

85- 89 

34 

89 5 

+ 7 91 

+0 65 

0 2422 

0 1474 

30 4 

90- 94 

18 

94 5 

+ 12 91 

+1 06 

0 3554 

0 1132 

23 3 

95- 99 

13 

99 5 

+ 17 91 

+ 1 48 

0 4306 

0 0752 

15 5 

100-104 

10 

104 5 

+22 91 

+ 1 89 

0 4706 

0 0400 

8 2 

105-109 

4 

109 5 

+27 91 

+2 30 

0 4893 

0 0187 

3 9 

110-114 

2 

114 5 

+32 91 

+2 71 

0 4966 

0 0073 

1 5 

115-119 

1 

119 5 

+37 91 

+3 12 

0 4991 

0 0025 

0 5 


206 

X 

= 81 59 

<r = 

: 12 14 




*Where t changes sign, the values of A(t) must be added 


(7) Plot the values of/* on the original histogram, and connect them by 
means of a smooth curve, as shown in Figure 6-9-1 
The suitability of the normal curve is judged by the quality of the fit 
of the normal curve to the original histogram, or by the agreement between 
the predicted and the observed frequencies m the table In Table 6-9-1, 
for example, we see that the normal curve predicts a frequency of 16.9 
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for the fifth class, while the class actually contains a frequency of 15. 
This difference could easily anse from accidents of sampling, and, in fact, 
all the observed frequencies are compatible with the hypothesis that the 
universe from which the sample came is distributed normally More 
exact tests to measure the probability that a given distribution is normal 
will be described m Chapter 11 

PROBLEMS 

1. Fit a normal curve to the data m Table 4-3-2, and graph the results as shown 
m Figure 6-9-1 For which class is the disagreement between the predicted and 
the observed frequencies the worst ? 

2 Fit a normal curve to the data m Table 4-5-1 

(0. USES OF NORMAL CURVE 

After the investigator has tested a distribution by the methods of the 
preceding article, and has accepted the hypothesis that the universe from 
which it came is approximately normal, he is justified in using the normal 
curve tables to predict the probabilities or frequencies for any future 
samples from the same universe The practical utility of this procedure 
can be described most rapidly by means of a few illustrative examples* 

I. A manager of a factory plans to establish a new plant, which will 
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require eighteen skilled men for supervisory positions He knows from 
experience that only those men who make a score of 115 or above on a 
given aptitude test are likely to succeed m these positions He gives this 
test to the first twenty-five men who apply for employment m the new 
locality and finds the distribution of scores shown m Table 6-10-1 How 
many men should he plan to interview and test in order to be reasonably 
certain of finding at least eighteen who will score 115 or above? 


Table 6-10-1 Scores of Twenty-Five Applicants 


Score 

/ 

70- 79 

2 

80- 89 

3 

90- 99 

9 

100-109 

8 

110-119 

2 

120-129 

1 


For this problem, the obvious pi ocedure would be to find what fraction 
of the sample had scores of 115 or above, and then divide 18 by this frac- 
tion to find the required number of applicants This procedure is hope- 
lessly weak because the sample is so small that this fraction is very poorly 
determined by the observations As an alternative to this we proceed as 
follows 

(1) We assume that the distribution is normal, both on the grounds that 
a normal distribution is generally to be expected for vaiiates of this nature 
and on the grounds that a normal curve fits the available observational 
data fairly well 

(2) We reduce the score of 114 5 to & t score The mean of the sample 
is 97 7 and the standard deviation is 116, giving us (114 5 — 97 7)/ll 6, 
or 1 45 for the score of 1 14 5 expressed m t units 

(3) We find that A(t) for this t scoie is 0 4265 Therefore, 42 65 per 
cent of the applicants should be expected to have scores between 97 7 and 
114 5, and of couise 50 per cent should be expected to have scores lower 
than 97 7 This leaves 7 35 per cent whose scores should be above 114 5 

(4) We must now select a number N such that 7 35 pei cent of N is 
greater than 18 N must therefore be greater than 18/0 0735, or 245 It 
would be reasonable for the manager to be prepared to test about 250 or 
300 applicants m order to be fanly confident that he would obtain the 
needed 18 men with scores of 115 or above 

II A turkey raiser sells his turkeys on a sliding price basis, depending 
upon the weights of the tuikeys The price scale is shown m Table 6-10-2, 
which also shows the distribution of weights per hundred turkeys in his 
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Table 6-10-2 

Distribution of Turkey Weights 

Weight 

No per 100 

Price per Pound 

7- 9 

0 i 

1 

m 75 

9-11 

1 1 


11-13 

5 1 
1 

0.65 

13-15 

23 ! 


15-17 

36 j 

0.55 

17-19 

27 ' 


19-21 

6 i 

i 

l 0 40 

21-23 

2 

\ 


flocks He is offered a new contract for his turkeys under which he will 
be paid as follows: 

Weight Price per Pound 

6 0 to 8 5 80^ 

8 5 to 11 0 70^ 

11 0 to 13 5 60^ 

13 5 to 16 0 55)4 

16 0 to 18 5 53^ 

18 5 to 21 0 51)4 

21 0 to 23 5 49^ 

Assuming that the distribution is normal, how many turkeys per 100 
should he expect to fall into each of the new classifications? Would it be 
to his financial advantage to accept the new contract or to continue under 
the previous arrangement? 

To solve this problem it is necessary to compute the arithmetic mean 
and the standard deviation of the distribution and then to compute the 
predicted frequencies m the new classes, using the procedure shown m 
Table 6-9-1 The values of the boundaries used should of course be the 
boundaries of the new classes, namely, 6 0, 8 5, 11 0, and so forth The 
details are left to the student as an exercise 

The operation described m the above paragraph is called “graduation 
of frequencies,” and is frequently performed in statistical work Turthei 
important uses of the normal curve will be described in future chapters. 

PROBLEMS 

1. In illustrative problem I, answer the following further questions, (a) How 
many applicants would it be necessary to interview to be fairly certain of finding 
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five with a score above 128? (b) If 800 applicants are to be examined, how many of 
them would you expect to have scores between 78 and 108? 

2. Complete the illustrative problem concerning the turkey raiser Find the 
predicted frequency for each of the new classes and the total payment which he 
would receive per hundred turkeys under each contract 

3. For the data m Table 1-4-3, if a sample of 1000 wires were chosen to be 
tested from the same machine, how many would you expect to have breaking 
strengths under 200 pounds? Under 195 pounds? With this information m mind, 
answer Problem V, Article 4, Chapter 1 

H. SUMMARY 

The material in this chaptei falls into three parts, as follows I. The 
technique of expressing any distribution m a standard form, m which it 
can be compared with any other distribution expressed m the same form 
II A theoretical explanation of the similarity of many distributions when 
expressed in this standard form III Uses of the theoretical conclusions 
so reached The procedures and the central results are as follows 

I Any distribution can be presented m a standard foim m which it is 
independent of the choice of class interval, size of sample, and units of 
measurement of the variate The piocess of converting the data to this 
standard form consists of (1) converting the x values to t units, where t 
is defined as (x — x)/a, and (2) converting the frequencies to probabilities 
per t unit, or P(t), which is computed fiom equation 6-3-2* 

P(t) = fcr/NC ( observed values of P ) 

The proceduie is illustrated in Table 6-3-1 and Figure 6-3-1 The purposes 
of this proceduie are as follows 

(1) To make compansons between distributions which have different 
class intervals, sample sizes, oi x units 

(2) To study and describe the propel ties which various distributions 
have m common, as a starting point for the theoietical study of these 
properties m part II 

(3) To make possible the estimation of the piobability of an occuirence 
between vanous values of x or between various values of t This is ac- 
complished by measuring the area under the probability curve, as shown 
in Figure 6-2-1, and as described m the accompanying text 

(4) To remove the distortion which is produced when data is tabulated 
m classes of different widths 

II When we study a number of observed distributions in the standard 
form described above, we find that many of them have almost exactly the 
same shape, shown m Figure 6-3-1 and described m the opening paragraph 
of Article 4 This characteristic shape can be explained theoretically 
if we adopt the following hypotheses concerning the causes of differences 
between the variates 
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(1) Each variate differs from the mean not as a result of a single cause 
or a few large causes, but as a result of a large number of causes, each of 
which contributes only a relatively very small amount to the final value 
of the variate 

(2) Each of these small causes is independent of all the other small 
causes 

On the basis of these hypotheses, it is possible to show theoretically that 
the resulting distribution m P(t) form should fulfil equation 6-5-1: 


P(t) = 



( theoretical values of P) 


where e = 2 718 • The curve obtained by plotting this equation is 
called the normal curve, and any distribution which approximates it is 
called a normal distribution Properties of the normal curve are de- 
scribed m Article 8 and illustrated m Figures 6-8-1 and 6-8-2 In practn^i 
use of the normal curve, the values of P(t) are not computed from the 
equation, but are instead read from precomputed tables The use of such 
tables is described m Article 7. 

Ill A standard procedure m statistics is to fit a normal curve to a given 
set of data The purpose of this is usually twofold: first, by comparing 
the distribution with the best fitting normal curve, it is possible to con- 
firm or reject the hypothesis that the universe from which the sample came 
was distributed normally Many further statistical conclusions depend 
upon this hypothesis, and it is important that it be tested m this way. 
Second, after the hypothesis has been confirmed, the normal curve tables 
can be used to predict the frequency of occurrence to be expected in any 
future samples to be taken from the same universe The operational 
procedures m fitting a normal curve are described m steps 1 to 7 in Article 
9, and the uses of the resulting information are described in Article 10. 



CHAPTER 


. 7 . 

FURTHER DESCRIPTIVE DEVICES 


1. INTRODUCTION 

One of the purposes of statistics is that of describing a distribution m 
precise mathematical terms, m order that the investigator may compare 
llrwith other distributions This function has been introduced m Chapter 
3, with the arithmetic mean and the standard deviation (both of which 
are numerical descriptions of pioperties of distnbutions) and will be con- 
tinued m this chapter 

This descnptive function is not the central purpose of statistics as it is 
treated in this book and is, furthermore, not necessary in the logical de- 
velopment of this central purpose We have come, m other words, to a 
branching in the objectives of elementary statistics The present chapter 
is one of the branches and is complete in itself, while in the following 
chapter we will return to the development of the primary objective It 
is suggested, therefore, that if you do not have time to study all the topics, 
you should omit this chapter entiiely and go at this point directly to 
Chapter 8 


2. PROPERTIES OF FREQUENCY DISTRIBUTIONS 

We have seen m Chapter 6 that many distnbutions occumng in practice 
closely resemble the noimal curve and that such distributions can be 
described by giving the arithmetic mean and the standard deviation 
Many other distributions resemble the normal cuive m a general way m 
that they rise smoothly irom zero to a single maximum and then decline 
smoothly to zero again It is for such distributions that the descnptive 
methods m this chaptei are chiefly useful II the distribution differs com- 
pletely from the normal curve, for example if it has more than one distinct 
maximum, then a description of the sort we are about to undertake is 
not very useful, and the best description of the distribution is the pres- 
entation of the frequency tabulation itself. 

If the distribution has a general resemblance to the normal curve, then 
the most obvious property to be described is the central tendency , or the 
value around which the variates are centered We have already studied 
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two quantities which describe the central tendency (the arit hm etic mean 
and the median) , and we will describe others in this chapter. 

Any description of central tendency is, however, incomplete because of 
the obvious fact that two distributions can have the same central tendency 
and yet be very different m other ways The most obvious difference is 
in dispersion, or scatter of the variates around the mean This is a property 
which we have already described by means of the standard deviation, 
which tells you whether the variates are widely scattered or whether they 
are closely clustered around the mean We will describe other methods of 
measurmg the property of dispersion in this chapter. 



Figure 7-2-1. Extreme Negative Figure 7-2-2. Moderate Posi- 
Skewness. five Skewness 


If two distributions agree in central tendency and dispersion, they can 
still differ, but m less obvious ways, and we must next describe these 
further differences It is possible, for example, for one of the distributions 
to be unsymmetncal, that is, for its histogram to descend more steeply on 
one side than on the other. This property is described as skewness , and 
methods for measuring it will be described in the following articles Dis- 
tributions showing varying degrees of skewness are shown m Figures 
7-2-1 and 7-2-2 

Now we must raise the question of whether further important differences 
can occur between distributions which are alike m central tendency, dis- 
persion, and skewness An mspection of Figures 7-2-3 and 7-2-4 will 



Figure 7-2-3. A Leptokurtic Dis- Figure 7-2-4. A Platykurtic Dis- 
tribution. tribution. 
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show that such a difference is possible Figure 7-2-3 differs from Figure 
7-2-4 m two ways first, lelatively more of its variates are m the central 
class (which tends to decrease the standard deviation) , second, relatively 
more of its variates are m the extreme wings (which tends to increase the 
standard deviation) If these two effects are balanced, the two distri- 
butions will have equal standaid deviations Thus we see that, without 
altering the standard deviation of a distribution, we can build up the 
center and the extreme wings at the expense of the “shoulders” of the 
curve This property of cential peakedness is called kurtosis The primary 
objective of this chapter is to explain methods for the numerical measure- 
ments of central tendency, dispersion, skewness, and kurtosis, for any 
distribution 


3. MOMENTS 

In Article 3, Chapter 4, we introduced the idea of a deviation from the 
arithmetic mean, or x — x It can readily be shown that the arithmetic 
mean of these deviations is always zeio In the same article, we i ntroduced 
the concept of the mean of the squares of those deviations, or (x — x ) 2 , 
from which we obtained the standaid deviation. Both of these are special 
cases of quantities which aie called moments The nth moment of a distnbu - 
button around its mean is the mean of the nth 'power of the devia tions of the 
variates fi o m the mea n Thus the second momen t is (x — x) 2 , the third 
moment is (x — x) s , and the fourth moment is (x — x ) 4 . The compu- 
tation of these moments from the definition is demonstrated in Table 7-3-1. 


Table 7-3-1 Computation of Moments 


X 

/ 

x — X 

/(* - «) 

/(« ~ £) 2 

/(* - 

/(* - *) 4 

10 

1 

-7 

- 7 

49 

-343 

2401 

12 

2 

-5 

-10 

50 

-250 

1250 

14 

5 

-3 

-15 

45 

-135 

405 

16 

11 

-i 

-ii - 

11 

-11 

11 

18 

25 

+i 

+25 

25 

+25 

25 

20 

6 

+3 

+ 18 

54 

+162 

486 

Sum 

50 


0 

234 

-552 

4578 

Suin/50 



0 

4 68 

-11 04 

91 56 


Our purpose m introducing the concept of moments is to investigate 
their utility as devices for describing distributions A study of Table 
7-3-1 shows that the negative deviations just balance the positive one 
(as of course they must), but that since the distribution is unsymmetncal 
this balance is achieved by balancing a few large deviations on one side 
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against many small ones on the other When these deviations are cubed, 
the large ones become relatively more important, so that the third moment 
is negative Thus we see that the third moment is strongly affected by the 
asymmetry of the distribution and might be used as a measure of asymme- 
try. It is not, however, a satisfactory measure of pure skewness, because 
its size is affected by the size of the units in which x is measured This is 
the same problem which confronted us m standardizing the representa- 
tion of a distribution m Aiticle 3 of Chapter 6, and we can solve it in the 
same way, by using t umts instead of x units The third moment in t units 
is called a 3 . 

a 3 = 7 (7-3-1) 

For purposes of computation, it is convenient to express this m terms of 
x, which we can do by substituting for t its value (x — x)/<r and simpli- 
fying 

« 8 = (x ~ xf/a (7-322) 

For the distubution m Table 7-3-1, the value of a z is —11 04/V4 68 3 or 
— 1 09 It is obvious that a negative value for a z indicates that the dis- 
tribution has a “tail” m the direction of negative values of x, as m Figure 
7-2-1, while a positive value indicates a tail extending m the direction of 
positive values of x, as m Figure 7-2-2 The numerical value of a 3 ranges 
from about — 1 to +1 for the most extreme examples of skewness generally 
found m practical problems Figure 7-2-2 demonstrates a moderate 
positive skewness (a 3 = +02), while Figure 7-2-1 demonstrates a very 
extreme negative skewness — — 1 09) 

The fourth moment (which is computed m the last column of Table 
7-3-1) is obviously not affected by skewness, since all the fourth powers 
will be positive To see what property the fourth moment measures, we 
must observe that the higher the power to which we raise each deviation 
before we take the mean, the more we emphasize the larger deviations m 
our result If, therefore, we increase some variates which are already 
large and decrease some which are already small (that is, if we increase 
the kurtosis of the curve), the fouith moment will be increased even 
though we adjust the changes m such a way as to leave the second moment 
unchanged The fourth moment, theiefoie, can be used as a measure of 
kurtosis To make it a pure measure, we again remove the effect of the 
units m which x was measured by using t units instead of x units The 
fourth moment, m t umts, is called a 4 

a 4 = 7 (7-3-3) 

which we again rewrite m x units for convenience of computation: 

= (x - x) 4 /a 4 (7-3-4) 

The value of a 4 for the distribution m Figure 7-2-3 is 3 6, and that for 
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Figure 7-2-4 is 2 2 It can be shown by advanced mathematics that a 4 
for the normal curve is exactly 3, and this should be used as a basis of 
comparison m interpreting observed values of a 4 If a 4 is greater than 3, 
as m Figure 7-2-3, the distribution is called leptokurtic, and if it is less 
than 3, as m Figure 7-2-4, it is called platykurtic In practice the value of 
a 4 usually lies between 2 and 4, lepresentmg respectively curves which 
are extremely flattened and extremely peaked m the center. 


PROBLEMS 


1. Find a$ and a 4 for the following distribution 


x 

2 

3 

4 

5 

6 


/ 

9 

6 

2 

2 

1 


2. Without computing a 4 , state whether you believe Figures 7-2-1 and 7-2-2 
to be leptokurtic or platykurtic Estimate the value of a t for each by comparing 
them, m general appearance, with the normal curve and with the curves m Figures 
7-2-3 and 7-2-4 


4. FORMULAS FOR RAPID COMPUTATION 

Equations 7-3-1 to 7-3-4 seive to define and a 4 , but for rapid com- 
putation other methods are better To derive the necessary equations we 
must extend the method which we used for x and <r x m Article 5 of Chapter 
4, which should be reviewed at this point 
As before, we will define x 0 as the class mark of any convenient class, 
and u as the senal number of any class, starting from the x () class and 
increasing with increasing values of x, as shown m the third column of 
Table 7-4-1 To compute and a x , we must now express them m terms 
of u and C, which are related to x by equations 4-5-2 and 4-5-4* 

x — x 0 + Cu and x = x Q + Cu 

If we substitute these for x and x m equation 7-3-2 we will have 

- {x-'xY/al = 

where we have replaced a x by Ca u , according to equation 4-5-7 Now let 
us cancel the C’s and expand the cubed quantity 

a 3 = (u 3 — 3 u 2 u + 3 uu 2 ~ if)/<Y u 

If we now separate this into four separate means, we note that the last 
two of these will both involve the cube of u, and so can be combined, 
giving us __ 


= ( u 3 — 3 u 2 u + 2if)/<rl 


(7-4-1) 
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X23 


= (x - xY/o-l = C\u - u) 4 /(CV 4 ) 

= (w 4 — 4u z u + 6u 2 u 2 — 4zm 3 -f u^/ot 

or finally, 

«4 = (t? - 4^ + - 3£ 4 )/<r 4 u (7-4-2) 

The use of these equations for the computation of a s and a 4 is shown in 
Table 7-4-1, which is self-explanatory 


Table 7-4-1 Rapid Procedure for a 3 and a 4 


Limits 

/ 

u 

fu 

fu 

fu 

fu 

100-119 

2 

-2 

-4 

8 

-16 

32 

120-139 

11 

~1 

-11 

11 

-11 

11 

140-159 

16 

0 

0 

0 

0 

0 

160-179 

14 

+i 

+ 14 

14 

+ 14 

14 

180-199 

5 

+2 

+ 10 

20 

+40 

80 

200-219 

2 

+3 

+ 6 

18 

+54 

162 

Sum 

50 


+ 15 

71 

+81 

299 

Sum/JV 



+ 0 30 

1 42 

+ 1 62 

5 98 

a = +o 30 

7 = 1.42 

u = +1 62 « 4 = 5 98 <r„ 

= 1 15 


c* 3 = [1 62 - 

3(1 42) (0 30) + 2(0 32) 2 ]/(l 15) 3 

= 0 34 



oil — [5 98 — 

4(1 62) (0 30) + 6(1 42) (0 30) 2 - 

3(0 30) 4 ]/(l 15) 4 

= 2.70 


PROBLEMS 

In the following problems, you can avoid unnecessary duplication of effort if 
you use the work which you have already done on the problems in Article 4 of 
Chapter 4 

1 Compute the values of a z and a 4 for the two sets of temperatures which you 
obtamed in Problem 1, Article 3, Chapter 2 Do these values, together with the 
arithmetic means and the standard deviations, adequately describe the differences 
between these two distributions? 

2 Compute a 3 and a 4 for the distribution m Table 4-2-1. 

3 Compute a 3 and a 4 for the ages of the dementia praecox patients described 
m Table 2-5-1 (Use a frequency tabulation with classes 15 to 19, 20 to 24, and 
so forth ) 

4 Write a complete derivation of equation 7-4-2, supplying and explaining the 
steps which have been omitted 
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5. MEASURES OF CENTRAL TENDENCY 

In describing the central tendency of a distribution, we present a single 
value which is lepiesentative, or typical, so far as possible, of all the 
variates Since there are various purposes for which such a typical value 
might be used, there are slightly drffeient ways of selecting it Depending 
upon the purpose for which it is to be used, any of the following measures 
might be useful 

I. The Arithmetic Mean 

This is a representative value m the following sense If all the variates 
had this value, the sum of all the variates would be unchanged If we 
know that an office has thirty-eight employees and if we know that the 
arithmetic mean of the weekly salaries is $41, then we can compute the 
tojfcal weekly payroll by multiplying these two figuies together 

II The Median 

The median is the fiftieth peicentile, and is computed by the procedure 
shown m Chaptei 2, Aiticle 6 The special property which the median 
possesses is the fact that exactly as many vanates he above the median 
as he below it Foi some purposes, this may be a more useful “typical 
value” than the arithmetic mean Suppose, lor instance, that a student 
is considering a given profession, and wishes to know the average income 
which people m that profession earn If the distribution of income is 
skewed to the right, that is, if it consists of a laige number of small incomes 
very tightly bunched, plus a very few incomes which ate veiy high, then 
the mean will be much laigei than the median To take an extreme case, 
let us suppose that the mean is $6000 and the median is $4000, m this 
case the student may reason as follows “$6000 would be sufficient for 
my needs, if I had a reasonable chance of obtaining it But m view of the 
skewness of the distribution, this mean is reached by balancing a high 
probability of leceivmg a slightly smaller income against a low piobabihty 
of leceivmg a very much highei income And m the light of my needs these 
two do not balance, a smaller income would be a senous handicap, while a 
very laige income would exceed my needs and would have less value per 
dollar to me than a moderate income For my pui poses it is moie useful to 
know the income which I have a fifty per cent chance of reaching ” This 
income is of course the median 

III The Mode 

The mode is the value of the variate for which the frequency curve 
reaches a maximum, it is therefore the most piobable value of the variate 
For most pui poses the mode can be computed with adequate accuracy by 
finding the class mark of the class which contains the highest frequency. 
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Using the symbol Z for the mode, we can abbreviate this definition as 
follows: 

Z £* CM mav (7-5-1) 

where CM max refers to the class mark of the class with maximum fre- 
quency, and where the symbol ~ means “is approximately equal to.” 
Thus for the distribution m Figure 1-4-1 the mode is 80 feet, and for 1-4-2 
it is 14 5 per cent 

A little reflection, however, will show that if we plot the frequencies of a 
distribution and then draw a smooth curve through the points, the maxi- 
mum value will he a little to one side of the highest plotted point, in the 
direction of whichever point is next highest An example of this is shown 
m Figure 7-2-2 It is reasonable to assume that the exact position of the 
maximum point is determmed by the relative sizes of the two adjacent 
frequencies, and that if, for example, the frequency of the class to the 
right of the maximum class is twice that of the class to the left, then the 
maximum will be twice as fai from the left edge of the class as it is from 
the light This assumption is adopted m the customary definition of the 
mode 

Z = LB,,„ + jjj Tfjjp c (7-5-2) 

where LB max is the lower (or left hand*) boundary of the class containing 
the maximum frequency, UF is the upper frequency, that is, the frequency 
of the class to the right of the maximum class, LF is the lower frequency, 
or the frequency of the class to the left of the maximum class, and C is the 
class interval For the distribution m Table 7-3-1, LB is 17, UF is 6, 
LF is 11, and C is 2 Thus we find that the mode as given by 7-5-2 is 
17 7, as compared to the approximate value of 18 by 7-5-1 
The mode is useful whenever we wish to know which value of the variate 
is most likely to occur If, for example, a department store offers a special 
sale of women’s stockings, but does not have room to stock all sizes, then 
the best sizes to stock would be those near the mode, smce these would 
fit a larger number of customers than any otheis 

IV. The Geometric Mean 

This is defined as the antilog of the mean of the logs of the variates 

GM = antilog (log x) (7-5-3) 

To find the geometric mean of 3, 4, and 6, foi example, we must find the 
logs of these three numbers (0 4771, 0 6021, and 0 7782), then find the 
mean of these logs (0 6191) and then find the antilog of this mean (4 160) 
The geometric mean is sometimes useful m problems m which a variable 
increases geometrically with time, as described m Article 6, Chapter 3 


*We assume that larger values of x are plotted to the right 
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V. The Harmonic Mean 


The reciprocal of a number is equal to one divided by that number. 
Thus the reciprocal of 6 is 1/6 and the reciprocal of 3/8 is 8/3 The 
harmonic mean of a set of numbers is the reciprocal of the mean of their 
reciprocals 


HM 


1 

(ljx) 


(7-5-4) 


Thus to find the harmonic mean of 3, 4, and 6, we find their reciprocals 
(1/3, 1/4, and 1/6), then find the mean of these (1/4), and then find the 
reciprocal of this mean, which is 4 

Suppose that a man drives 120 miles at 60 miles per hour, and then 
makes the return trip at 40 miles per hour, what is his average speed? 
The question, of course, has no exact meaning until we specify the kind 
of average, but we can study the relative utility of the various kinds of 
averages We might for example require that the average must have 
the property that the total time for the tnp would have been the same if he 
had driven at the average speed throughout With this lequirement m mind, 
we can readily see that the arithmetic mean (50mph) is not suitable, for 
if he drives 240 miles at this speed he will require 4 8 horns, while the 
actual trip lequired five hours Instead we must divide the total mileage 
(240) by the total time actually consumed (5 hours), and we find that the 
required average speed is 48 miles per hour An inspection of the actual 
computations which we have made here shows that we have, m effect, 
computed the harmonic mean of 40 and 60 


PROBLEMS 

1. Compute the arithmetic mean, the median, the geometric mean, and the 
harmonic mean of the numbers 2, 3, 4 and 12 

2. Find the mode of the distribution m Table 2-3-2, using first equation 7-6-1 
and then equation 7-5-2 


6. MEASURES OF DISPERSION 

The following measures of dispersion are widely used in n um erical 
descriptions of distributions 

I. Standard Deviation 

The deviation of any variate from the mean is the difference between 
that variate and the mean The square root of the mean of the sum of the 
squares of these deviations is the standard deviation In addition to its 
usefulness as a descriptive device, this quantity is used extensively m the 
development of the theory of statistics, and so has already been introduced 
in Chapter 4 
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II. Mean Deviation 

The absolute value of a number is the size of the number, without regard 
to whether it is positive or negative It is denoted by placing vertical 
bars to the left and the right of the number For example, | — 3 { is 3, 
and | 6 | is 6 The mean deviation is the mean of the absolute values of 
the deviations 

MD = )x - x\ (7-6-1) 

Thus to find the mean deviation of the numbers 3, 4, and 8, we find the 
deviation of each number from the mean (—2, —1, and +3), and find the 
mean of the corresponding absolute values (2, 1, and 3j, which we find to 
be 2 The mean deviation is usually a little smaller than the standard 
deviation, smce m finding the standard deviation we square all the devia- 
tions and thus give more weight to the larger ones 

III. Variance 

In finding the standard deviation we begin by squarmg all the devia- 
tions and finding the mean of the result If w T e stop here, we have a 
quantity which is called the variance and which is frequently used to 
describe the dispersion of the frequency distribution. Its formal defi- 
nition is 

Variance = (x — xf (7-6-2) 

For example, the variance of the distribution m Table 4-3-1 is 8 33 From 
its definition it is obvious that the variance is the square of the standard 
deviation 

IV. Quartile Deviation 

In Article 6, Chapter 2, we studied the first quartile and the third 
quartile, which are defined in such a way that one-fourth of the variates 
are below the first quartile, and three-quarters are below the third quartile 
The interval between these two quartiles is sometimes called the inter- 
quartile range The quartile deviation is defined as half of the inter- 
quartile range. 

Q = (7-6-3) 

This is very similar to the probable error, as defined m Article 8 of Chapter 
6 The quartile deviation has the following property If a variate is 
chosen at random, the probability is one-half that it will differ from the 
mean by less than the quartile deviation. 

PROBLEMS 

1 Find the mean deviation for the data m Table 4-3-1 

2 Find the mean deviation for the data m Table 4-3-2 
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3. What is the variance of the distribution m Table 4-3-1? In Table 4-3-2? 
In Table 4-5-1? 

4. Find the quartile deviation of the distribution shown m Figure 2-6-3. How 
does this compare with the probable error of the distribution as defined by equa- 
tion 6-8-1? 


7. MEASURES OF SKEWNESS 

The following quantities are all widely used as descriptive measure- 
ments of skewness 

(a) Alpha sub-three The quantity a z , which we described m the 
preceding article, is an exact descriptive device which should be used when 
making a critical comparison between the properties of distributions. 
For less exact comparisons, any of the following equations may be used. 

(b) Sk - (Q, + Q x - 2 Q a )/(Q S - Qi) (7-7-1) 

r 

This equation depends upon the fact that if the distribution is skewed, the 
second quartile will not be midway between the first and third quartiles 
The skewness is obtained by subtracting one of these intervals (Q 2 — Q x ) 
from the other (Q 5 — Q 2 ) and then dividing by the entire interval (Q s — 

Qt) 

This quantity measures the skewness m only the middle portion of the 
curve and is not affected by the shape of the wings beyond the first and 
third quartiles 

(c) Sk. = (Poo + Pio 2Pr; 0 )/(P 90 — P 10 ) (7-7-2) 

This equation takes into account moie of the distribution than 7-7-1, but 

is not affected by the location of the variates below P 10 or above P Q0 

(d) Sk - 3(* - M)/a (7-7-3) 

(e) Sk = 2(x - Z)/<j (7-7-4) 

These equations use the fact that the median (M) and the mode (Z)* are 
both displaced from the mean if the distribution is skewed 
Equations b, c, d, and e all give values for the skewness which agree 
in sign with a 3 , but which cannot be directly compared with it m size 
For example, the skewness of Figure 7-2-1 as measured by these equa- 
tions ranges from —0 3 to —0 7, while the value of a s is —II 

PROBLEMS 

1 Compute the skewness of the ages of dementia praecox* patients in Table 
2-5-1, using equation 7-7-1 


*The mode should be computed by the exact equation (7-5-2) if it is to be used for 
this purpose 

*Use the ogive which you constructed for Problem 4, Article 6, Chapter 2. 
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2 Repeat Problem 1, using equation 7-7-2 Comment on the difference between 
these two results 

3 Repeat Problem 1 again, using first equation 7-7-3 and then equation 7-7-4 

8. MEASURES OF KURTOSIS 

The following quantities are used to describe the kurtosis of a distri- 
bution 

(a) Alpha sub-four The quantity a 4 , which we described in Article 
3, should be used whenevei an exact comparison of kurtosis between two 
distributions is to be made For less precise comparisons, the following 
equation can be used 

(b) Ku = Q/(P 90 — Pio) where Q = (Q z — QO/2 (7-8-1) 

The kurtosis as obtained by this equation has no direct relationship with 
a 4 , and cannot be compared with it The following information will be 
useful m intei preting the size of this measurement For Figure 7-2-3 
the kurtosis as given by this equation is 0 17, for the normal curve it is 
0 26, and for the curve m Figure 7-2-4 it is 0 31 

PROBLEM 

1 Compute the kurtosis of the distribution of dementia praecox patients m 
Table 2-5-1, using equation 7-8-1 How does this value compare with that of the 
normal curve? 


9. SUMMARY 

Chapter 7 deals entirely with the use of statistical techniques for formu- 
lating exact descriptions of distributions The descriptions are expressed 
in terms of numerical measurements of four basic properties, as follows: 

(1) Central tendency, or “most typical value ” The various methods 
for the measurement of this property are as follows 

(a) Arithmetic mean, defined by equation 3-10-1 

(b) Median This is the fiftieth percentile, which can be computed by 
the procedure described m Chapter 2, Article 6 

(c) Mode, or position of maximum kklihood This can be computed 
approximately by equation 7-5-1, or more exactly by equation 7-5-2 

(d) The geometric mean Defined and computed by equation 7-5-3 

(e) The harmonic mean Defined and computed by equation 7-5-4 

The uses of these various measurements of central tendency are de- 
scribed m Article 5 

2 Dispersion, or amount of scatter of the observations This property 
is measured by any of the following methods 
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(a) Standard deviation Defined by equation 4-3-2 and computed by 
means of equation 4-4-1, 4-4-3, or 4-5-5 

(b) Mean deviation. Defined and computed by equation 7-6-1 It 
is similar to standaid deviation 

(c) Variance This is defined by equation 7-6-2 It is simply the 
square of the standard deviation 

(d) Quartile deviation Defined by equation 7-6-3 It is usually 
about two-thirds of the standaid deviation 

3 Skewness, or asymmetry This property can be measuied by any 
of several methods, which are independent of each other except that they 
all are negative when the “tail” of the distribution extends m the direc- 
tion of smaller values The various methods are. 

(a) Alpha sub-three Defined by equation 7-3-1 or 7-3-2, but best 
computed m general by equation 7-4-1 The meaning of a 3 is 
indicated by Figures 7-2-1 and 7-2-2 

(b) Measuies depending upon asymmetry m the locations of per- 
centiles Equation 7-7-1 measures the asymmetry of the central 
portion of the distribution, while equation 7-7-2 measures the 
asymmetry over a widei lange 

(c) Measures depending upon the displacement of the mean relative 
to the median (equation 7-7-3), or relative to the mode (equation 
7-7-4) 

4 Kurtosis, or central peakedness as compared with the normal curve 
This is measured by cither of the following 

(a) Alpha sub-fom Defined by equation 7-3-3 or 7-3-4 but usually 
computed by the more lapid equation 7-4-2. The meaning of 
the resulting value of is indicated by Figures 7-2-3 and 7-2-4, 
and by the fact that for the normal curve is exactly three 

(b) Equation 7-8-1 gives a quick measure of kurtosis Its scale 
(not the same as that of a 4 ) is indicated m Aiticle 8. 
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SIMPLE CURVE FITTING 


1. EXAMPLE OF CURVE FITTING 

Curve fitting is the operation of finding the equation of that curve 
which will best represent the relationship between two statistical variables. 
The curve may be of any sort, but the simplest and most widely used is 
the straight line, which is a particular kind of curve m the mathematical 
sense In order to focus our discussion of the purpose of curve fitting, let 
us begin by describing a specific example of a fitted curve 

The stais around us are arranged m highly organized systems called 
galaxies Each galaxy contams tens of billions of stars, arranged usually 
m a huge flat spiral structure Our galaxy contains all the stars which we 
can see and many billions more which are too faint to be seen From 
our off-center position we see our own galaxy mostly to one side of us, 
forming the Milky Way. Beyond the edges of our own galaxy we see 
many other galaxies, more or less like our own, out to the farthest distance 
that the most poweiful telescope can reach. It is possible to measure the 
distance to some of these galaxies, and by spectroscopic means it is possible 
to tell whether they are approachmg us or receding from us, and with what 
speed Some results of these remarkable measurements are given m 
Table 8-1-1 

It is seen from this table that the galaxies are receding from us, or we 
from them, with extraordinarily high velocities (the velocity of a bullet, 
for example, is m the neighborhood of only one mile per second), and that 
the velocities of recession are related to the distances in a very systematic 
way This relationship becomes even more conspicuous if we plot the 
data, as we can see from Figure 8-1-1 In this figure we see that the pomts 
all fall nearly along a straight line The straight line m the figure has been 
drawn m by mspection, that is, by juggling a ruler around on the drawing 
until it appeared to represent most of the points as well as possible The 
equation of the lme is 


Velocity = 18 9 X Distance 


( 8 - 1 - 1 ) 
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Table 8-1-1 Distances and Velocities of Galaxies 


Galaxy 

Group 

Distance m 
Billions of 
Billions of Miles 

Velocity m 
Miles per 
Second 

Virgo 

40 

750 

Pegasus 

130 

2400 

Perseus 

210 

3200 

Coma 

270 

4700 

Ursa Major No. 1 

500 

9300 

Leo 

610 

12400 

Gemini No 1 

670 

14300 

Bootes 

1340 

24200 

Ursa Major No 2 

1380 

26100 


More precise methods of finding a line to represent the data will be dis- 
cussed m this chapter, but first let us consider the uses to which such a 
line can be put 



Figure 8-1-1. Velocities of the Galaxies. 


2, PURPOSES OF CURVE FITTING 

Curve fitting is an operation which is used widely by scientists, edu- 
cators, psychologists, sociologists, test administrators, and others The 
purpose for which the procedure is used may be any of several, depending 
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somewhat upon the field which is under investigation. The primary 
purposes will be described here, and illustrated where appropriate by 
means of the example m Article 1. 

I. Estimating 

The direct methods for finding distances to galaxies fail when astron- 
omers try to apply them to the most distant galaxies which can be observed 
It is however possible to find the velocities of some of them even at these 
very great distances By inserting these velocities into equation 8-1-1, - 
we can compute the distance to each of these galaxies For example, if 
a galaxy is too far away for its distance to be measurable, but is known 
to have a velocity of 38,000 miles per second, as determined from spectro- 
scopic measures, we can insert 38,000 for “velocity” into equation 8-1-1 
and compute its approximate distance • 

38,000 = 18 9 X Distance 

or Distance = 2010 billion billion miles 

The distance deduced in this way is obviously less reliable than a distance 
which has been measured directly We will discuss later the factors which 
affect the accuracy of such estimates and the way m which their precision 
can be estimated 

II. Concise Description 

A single equation may express the essential facts from many pages of 
data If two sets of similar data are to be compared, the comparison can 
be greatly facilitated if we begin by expressing both m the form of equa- 
tions The essential differences and the essential similarities will then be 
readily apparent 

ill. Analysis of Causes 

In any attempt to explain why changes in x produce or cause changes 
m y, or aie accompanied by them, the first step is to describe as precisely 
as possible the relationship which we are trying to explam The formula- 
tion of an equation expressing this relationship is one of the clearest ways 
to describe it 

In the case of the galaxies, the equation shows us that the velocities of 
recession are directly proportional to the distances from us, and this is the 
first step m understanding the physical situation which this implies, 
namely, that our entire universe of galaxies must be expanding uniformly , 
so that an observer on any galaxy must see all other galaxies receding 
from him and that the farther away a galaxy is, the faster its distance 
from the observer will be increasing 
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IV. Measurement of Physical Quantities 

The constants m the equation relating x and y are m some cases physical 
quantities which the investigator wishes to determine For example, the 
well-known law for the pressure and volume of a gas at constant tempera- 
tuie (PV = C, where C is a constant) is not quite exact because it ignores 
the size of the molecules of the gas A more exact law is* 

P(V - a) = C (8-2-1) 

where a is a small quantity related to the size of the molecules If a 
number of very exact observations of pressure and volume of a gas are 
made, and equation 8-2-1 is fitted to the observations by adjusting a and 
C } the value of a so determined can be used to estimate the size of the 
molecules of the gas 

V. Prediction 

For most users, the outstandingly important use of curve fitting is that 
of estimating a value of y when x can be measured but y cannot A very 
important special kind of estimating arises when we establish an equation 
relating early measures to later measures on the same object, and then use 
the equations for predicting future behavior of other similar objects This 
use is of course logically identical with the first item m this list, but it will 
be described separately because of its importance m statistical studies 

Let us suppose that for several years a school has kept records of the 
scores made on entrance examinations by candidates for admission 
Suppose furthermore that the examining officials have kept track of the 
examinees who weie admitted to the school and have compiled their 
subsequent grades m their courses Then, if we let x equal the score made 
on the entrance examination and y the subsequent grade average made by 
the student, we can fit a curve to these two variables and use it to predict 
the probable future grade average of any candidate now applying for 
admission, if we have his entrance examination score Such predictions 
can obviously be of gieat value m screening applicants for admission to 
schools or m screening candidates for employment The procedure is 
applicable to any situation m which we know some of the controlling 
factors and wish to know their probable future effect, for example, it can 
be used to predict the size of the nation's fall wheat crop if we know, 
during the summer, the total rainfall, average temperature, and other 
similar factors 

Obviously such predictions will be subject to error, since the causation 
factors are usually far too numerous to be included m the equation, and 
we must generally limit ourselves to the one or two causes which influence 
the result the most Fortunately the theory of statistics can be used to 
determine the precision of a prediction from the same data used for making 
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the prediction itself The precision of predictions will be discussed in later 
chapters 


3. EQUATION OF A STRAIGHT LINE 

The simplest and most frequently used “crnve” in curve fitting is the 
straight line Before proceeding to the fitting of straight lines, we must 
review the fundamental equation of a straight lme and the meaning of the 
quantities contained m it * 

The equation of any straight line (unless it is parallel to the y axis) can “ 
be written m the form 

y = mx + b (8-3-1) 

where x and y are the coordinates of any point on the lme, and m and b 
are numbers which characterize the line and distinguish it from any other 
straight line. For example, if we give m the value 2 and give b the value 
3, the equation becomes y = 2z + 3, which is the equation of the specific 
straight line shown m Figuie 8-3-1 Similarly, y = 5z + 7 is a straight 
lme, with m = 5 and 6 = 7, and y — — 3/4 z — 9 is a straight lme with m 
= —3/4 and b — —9 Furthermore, 5z-f2y — 7 = z + ?/— lis also 



Figure 8-3-1. The Line y = 2x 4* 3 


*If you are already familiar with the standard “slope intercept ’ 1 form for the equa- 
tion of a straight lme, you should omit Articles 3, 4, and 5 
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a straight line, since we can collect terms and reduce it to y — — 4x + 6, 
which has the standard form 


PROBLEMS 

Plot the following lines 

1 y = 2>x — 5 

2 y = — 2x + 1 

3 2x + Sy — ■ 2 = 5x — 7y — 8 


4. MEANING OF m AND b 


We can readily discover the meaning of b by examining the table of 
values m Figure 8-3-1 b is the value which y takes on when x = 0 Or, 
graphically, b is the height at which the line crosses the y axis Since b is 3 
in our illustrative equation, the line must cioss the y axis 3 units above the 
origin In the equation y = 2x — 5, b is — 5 and the lme must therefore 
cross the y axis at a height of —5, or five units below the ongin 
The meaning of m is a little more complex Suppose that we let the 
coordinates Xi and y t stand lor any fixed point on the line Then since 
{x lf yi) is on the line, it satisfies the equation of the lme, that is, y x = 
mx x + b If (x 2) y 2 ) is anothei point on the lme, then y 2 = mx 2 + b If 
we subtract one of these equations fiom the other and solve for m, we have 


m = 


Vi 


x 2 — x x 


(8-4-1) 


This equation makes it possible fox us to find m if we know the coordinates 
of any two points on the line Foi example, if we know that a lme goes 
through points D and E in Figure 8-3-1, we can find m by calling E point 
2 and D point 1, and applying equation 8-4-1 


m 


7-5 
2 - 1 


- 2 


If we had made the choice m the opposite way, by choosing to call E point 
1 and 2) point 2, the equation would have been a little different, but the 
final xesult would have been the same. 



The term “slope” is frequently used for m, and the term has the same 
meaning m mathematics that it has m road building, namely, the ratio of 
“rise” to “run ” If the slope is positive, the lme runs fiom lower left to 
upper right, if negative, from upper left to lower right If m is very small, 
the line is nearly horizontal, if it is very large, the lme is nearly vertical. 
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PROBLEMS 

Compute the value of m for the line through each of the folio wmg pairs of points. 

1 (3, 5) and (6, 17) 

2 (—2, 4) and (3, -6) 

3 (-1, -2) and (-3, 0) 

4 (—3, 5) and (2, 5) 

5. EQUATION OF A GRAPHED LINE 

The relations m the preceding paragraph can be used to discover the 
equation of a line for which a graph has been made The procedure is as 
follows 

1 Extend the line if necessary to cut the y axis, and read the value of 
y where it crosses This value, with its plus or minus sign, is b 

2 Select any two points on the line and label them “1” and “2.” (It 
is easy to make mistakes about signs here, and it is better to label the two 
points on the graph than it is to try to remember which one is being called 

and which “2 ”) Read the coordmates x x and y 1 from point 1, and 
x 2 and y 2 from point 2, with their minus signs if they are negative, and 
insert these values m equation 8-4-1 to obtam m 

3 Inseit these values of m and b m equation 8-3-1 For example, let 
us find the equation of line A in Figure 8-5-1 We first observe the point 



Figure 8-5-1. Finding the Equation from a Graph. 


at which the line crosses the y axis The value of y at this point is —2, 
which is therefore the value of b Next we select two points on the line 
and read their coordinates Any two points will serve, but if the numbers 
are large, it is obviously faster if we select points differing m x by 10 or 20 
or some other simple number For the points chosen in line A, x 2 is 5, 
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y 2 is —4, Xi is —5, and y x is 0 We insert these m formula 8-4-1, and 
obtain the value of m for the line 

-4-0 4 

m ~ 5 - (-5) “ 10 

Inserting these values of m and b m 8-3-1, we have the equation of line A. 



PROBLEMS 

1 Write the equation of line B m Figure 8-5-1. 

2 Write the equation of line C m Figure 8-5-1 

3 Write the equation of line D m Figure 8-5-1 

6. CRITERION OF BEST FIT 

In general when a set of observed values of x and y are plotted, they will 
not all he on any straight line In many cases the deviations arise from the 
fact that y measures something which is dependent not only upon x but 
also upon other variables In other cases, y is exactly related to x, but the 
investigator has not been able to measure y with adequate precision, so 
that an unknown error of measurement is contained m each y In either 
case we can picture each value of y as being made up of two parts, one of 
which is completely dependent upon x and the other of which is completely 
independent of x The completely dependent pait is the part which we 
can expect to predict exactly from a knowledge of x Let us call it the 
predictable part, or y v The remaining pait, which is independent of x, 
we will call y T , or the random part of y Then we have assumed that foi 
any y in our tables 

V = Vv + Vr 

or, if we solve this for y r 

y r — y y& (8-6-1) 

This random part of y is often called a deviation , since it is the amount by 
which the observed value of y differs or deviates from the predicted value 
It is, m other words, the amount by which the best prediction would fail 

Now it is customary and reasonable, unless theie is evidence to the 
contrary, to assume that the random contributions (that is, the values of 
y r ) would, if collected into a frequency tabulation, form a normal curve 
Let us call the standard deviation of this distribution a r . We may regard 
this as a fixed number which would become known to us if we had in- 
formation about the contributions to y from other variables, but which is 
unknown to us in practice The relationship between y 7 y P , and y r is 
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shown in the left half of Figuie 8-6-1, and the nature of the ass um ed dis- 
tribution of the values of y r is indicated m the right half of the figure 




Figure 8-6-1. The Deviations from a Straight Line 

Since we assume that the values of y T are distubuted normally, the 
piobability of occurrence of a y r of a given size is given by the equation 
of the normal curve (6-5-1), which we can write 

P(y r ) = Ce’ i9r%/Urm) 

where C is a constant which need not concern us here The probability 
that all of a given set of deviations will occur is the product of the prob- 
abilities that each will occur, according to equation 5-6-1 If we call the 
several deviations (y r ) x , (y r ) 2 , and so forth, then the probability that all 
of the given set will occur is 

P[(y r ) x and (y r ) 2 and ] = Ce’ l(VrU9/Ur9] Ce l(, " > - B/2 " al 

If we let N be the number of deviations m the set and if we combine the 
exponents according to equation 3-3-1, we can write this 

P[(y r )i and (y r ) 2 and ] - CV t( ^ )l9 + <yr)ia+ 1/2<r ' 

Using the summation notation for the bracketed part of the exponent, 
this becomes 

P[that all the y r ’s will occur] = CV SVfV(2,,a) 

The quantity ^yl is the sum of the squares of the deviations Smce this 
sum is preceded by a minus sign m the equation, we can see by referring to 
equation 3-3-6 that, as the sum becomes laiger and larger, the probability 
of obtaining the given set of deviations becomes smaller and smaller In 
other words, the largei the sum of the squaies of a given set of residuals, 
the smaller the probability that the set of residuals will occur The set of 
residuals 0, 1, 0, 0, and 5 is less likely to occur than the set 2, 3, 2, 2, and 1, 
because the sum of the squares of the first set is 26 while that for the second 
set is only 22 
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For the present purpose we use this punciple with a reversed orienta- 
tion We have shown that for a given line , the most likely to occur of 
several sets of observations is that set which makes the sum of the squares of 
the deviations from the given line a minimum Now if we have one set of 
observations and several competing candidates for the line of best fit, it is 
reasonable to choose the one m the light of which the observations are most 
likely , that is, the one for which the sum of the squares of the deviations 
is a minimum 

This is a very impoitant principle m statistics, sometimes referred to as 
the principle of least squares A geneialized statement of the principle is 
this Of several competing hypotheses which are m other respects equal 
m merit, the one which makes the sum of the squares of the deviations a 
minimum is the most likely to be correct 

This principle is used m many fields In order to familiarize the reader 
with it, let us pause for a moment to study a simple example of its applica- 
tion A scientist measures a fixed quantity, and obtains a series of values 
Xi , x 2 , x N , which differ slightly from each other because of small 
uncontrollable errois of his mstiuments In reporting his conclusions, 
he must choose one of these values as the one most likely to be correct, or 
perhaps he will choose a value computed from them, such as the arithmetic 
mean, or the median, or some other kind of average Suppose that m 
order to make a wise choice he postpones a decision and temporarily calls 
this unknown best value “X ” Then the deviations from X will be 
— X, x 2 — X , and so forth, and the sum of the squares of these deviations 
will be 

Xd 2 = ( Xl - X) 2 + (x 2 - X) 2 + * = 2(x - X) 2 

where we have used the summation notation for convenience We can 
express this m a still simpler foim if we divide by N to convert both sides 
to arithmetic means 

Xd 2 /N « I 2 = 2(x - Xf/N == lx - X) 2 

or, if we multiply this out and apply equations 3-10-2, 3-10-4, and 3-10-5, 

d 1 = x 2 - 2xX + X 3 = 7 - 2xX + X 2 

To see how this can be made a minimum, let us add x 2 to the right-hand 
side and then subtract it again This gives us 

I 2 - (X 2 - 2xX + x 2 ) + ? - x 2 
or _ 

d 2 - (X - x) 2 + x 2 - x 2 

This is the quantity which we_ wish to make a minimum by a suitable 
choice of X The quantities x 2 and x do not contain X and so cannot 
be altered by our choice The term (X — x) 2 must always be positive, 
since it is the square of a number We can make this term smaller and 
smaller by choosing X closer and closer to x , and the smallest possible 
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value (zero) occurs when X equals x In other words, if we choose X 
equal to x, we will make the sum of the squares of the deviations as small 
as they can possibly be made The best choice of a “most hkely value of 
x” is therefore the arithmetic mean, rather than the median or any other 
possible type of average 

7. GRAPHICAL PROCEDURE 

With the principle of least squares in mind, it is possible for a careful 
worker to draw by inspection a line of adequate accuracy for most pur- 
poses. This is best accomplished by plotting the pomts as rather con- - 
spicuous large dots, so that the whole configuration can be taken in at a 
glance, and then laymg a transparent ruler along the dots and shifting it 
around until the points which he above the line are well balanced by the 
pomts which lie below In this operation, it must be kept m mind that 
one large deviation is more to be avoided than two smaller ones, each 
half as large, since the square of the large one is much larger than the sum 
of the squares of the two small ones. One should carefully avoid the 
temptation to secure a perfect fit with almost all the pomts at the expense 
of a very large deviation for one or two points 

As an example of these principles, let us fit a line to the data given m 
the first two columns of Table 8-7-1 The points are plotted in Figure 8-7-1, 
and a straight line has been drawn by inspection, m such a way as to 



10 20 30 40 50 v 10 20 30 40 50 

Figure 8-7-1. A Poorly Drawn Figure 8-7-2. A Better Line. 
Line. 


illustrate the error described m the paragraph above The deviations are 
the vertical distances between the pomts and the line; and it is readily 
seen that four of these deviations have been made small at the expense of 
the last one, which is as a result excessively large A better least squares 
line, fitted to the same data, is shown m Figure 8-7-2 Here the lme has 
been readjusted so that there is no outstandingly large deviation, and we 
should expect this to prove to be a better lme 
To test these two straight lines by the criterion of least squares, we 
first obtain the equation of the first lme by the methods of Article 5, and 
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Table 8-7-1 Least Squares Test of Line 


Data 


Test of y v 

= 0 5x + 

2 

Test of y v 

= 0 7a; - 

- 3 

X 

y 

V» 

(: y - 

• Vv) ( v - 

- Vvf 

Vv 

C y - 

Vv) ( V 

- y,Y 

10 

5 

7 

- 

-2 

4 

4 


1 

1 

20 

15 

12 


3 

9 

11 


4 

16 

30 

15 

17 

- 

-2 

4 

18 

- 

-3 

9 

40 

20 

22 

- 

-2 

4 

25 

- 

-5 

25 

50 

35 

27 


8 

64 

32 


3 

9 


85 60 


find it to be y v = 0 hx + 2 Next, we compute the values of y v correspond- 
ing to each value of x (column 3 of Table 8-7-1), and we then find the 
deviations of the original y ’ s from these computed y’s (column 4) Finally, 
we square these deviations (column 5) and add the results We see that 
the sum of the squares of the deviations from the first line is 85 To 
apply the test to the second line we obtain its equation m a similar manner 
(y v == 0 lx — 3) and carry out the same computation as before (columns 
6, 7, and 8) We see that the sum of the squares of the deviations from 
the second line is only 60 Thus the second line is better than the first 
line If a third line can be drawn so that the sum of the squares of the 
deviations is even less than 60, it will be better than either of the two 
tested, m the sense that it will have a highei probability of being the 
correct one This method of testing a line by evaluating the sum of the 
squares of its deviations is rarely needed m practice, and the primary 
purpose of this article is to illustrate the criterion of least squares In 
practice the best possible line is usually found, not by trial and error, 
but by a single mathematical procedure which will be described in the 

following article 

If the points on the diagram are 
numerous and scattered, so that it is 
difficult to “see” a line m the con- 
figuration, it may be helpful to re- 
place each pair of points by a “master 
point” halfway between the two and 
thus reduce the number of points and 
also reduce the scatter. If the scatter 
is still excessive, pairs of master points 
can be combined This process should 
be used with reserve and carried only 
far enough to make it possible to 
draw a fairly unambiguous line An example of replacing scattered points 
by master points is shown m Figure 8-7-3 



Figure 8-7-3. The Use of Master 
Points. 
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PROBLEMS 

1 Plot the mathematics grades (second column of Table 1-4-4) agamst the 
entrance exam scores (first column) Combine the points by pairs to form master 
points, then combine these by pairs Draw a straight line by inspection and write 
its equation 

2 The scores made by five children on a part of an intelligence test are 

Child’s Age Pomts Scored 

(®) ( y ) 

2 12 

3 15 

3 13 

5 16 

7 20 

Plot these pomts, draw a straight line fitting them as well as possible, and write 
its equation 

3 Two students fitted the following two lines to the data m the precedmg 
problem 

(a) y = 1 5x + 9 4 

(b) y = 1 lx + 8 6 

Test these two lines by the criterion of least squares and state which is better 
Retain your answers for future use 

8. LEAST SQUARES LINE 

In the precedmg article we have considered the means by which an 
investigator could, by visual inspection, fit a line which will satisfy the 
least squares criterion approximately In this article we will show how 
we can, by a mathematical procedure, obtain m one step a line which 
makes the sum of the squares of the deviations an absolute rmnimum and 
which is therefore the best possible fit out of all the infinite number of 
possible lines which might be fitted to the data 
Since we are assuming that the line of best fit is to be a straight line, 
we can write 

y v = mx + b (8-8-1) 

If we insert this m 8-6-1, we obtain for the deviations, 
y r = y ~ y P = y ~ (mx + b) 

The sum of the squares of the deviations can therefore be written 

s(y ~ y v f = sfr - (mx + V)] 2 

Let us divide both sides of this by N and rewrite it m terms of arithmetic 
means Then if we expand the right hand side and simplify, we have 

(y ~ y v ) 2 » 7 + mV + b 2 - 2 mxy - 2 by + 2 mbx (8-8-2) 
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This is the term which we wish to make as small as possible In order to 
accomplish this, we are free to try various straight lines, that is, various 
values of m and 6, until we have reduced the entire right-hand side to its 
smallest possible value 

To see how we should choose m and b to accomplish this, let us consider 
for a moment a simpler experimental problem of the same type Suppose 
that we wish to choose m m such a way as to make m 2 — 6m + 14 a mini- 
mum We could begin by rewriting it m the form (m 2 — 6m + 9) + 5, 
or 

m 2 - 6m + 14 = (m - 3) 2 + 5 (8-8-3) 

Now, obviously, since it is the square of a number, (m — 3) 2 cannot be 
negative, and the smallest value it can possibly attam is zero To make 
it zero, we must make m = 3 This is therefore the answer to our experi- 
mental problem 

In order to apply this procedure to equation 8-8-2, we must group the 
terms containing m and express the entire quantity as the square of a 
quantity containing m, plus othei quantities which do not contain m, just as 
we did m equation 8-8-3 The result of this grouping is 

( y - VvY 



Our objective is to adjust m m such a way as to make the above expression 
as small as possible Only the first parenthesis m the brackets contains m, 
and, following the same logic as that used m the experimental problem, 
we choose m in such a way as to make this parenthesis zero Placing the 
parenthesis equal to zero and solving for m, we have 


xy — bx 
m - - 

x 


(8-8-5) 


This is the value of m which will make the sum of the squares of the devia- 
tions as small as it can be made by adjusting m, but it is not yet a useful 
result because it contains 5, which is itself a quantity which we must adjust 
m an effort to reduce the sum of the squares of the deviations To find 
the best value for b, we must repeat the above procedure, but this time we 
must group the terms m such a way that they consist of the square of a 
term containing b, plus other terms not containing b Grouping the terms 
m 8-8-2 m this way, we have 

(: y - Vv ) 2 = [b — (y — ™£)] 2 + wV - %mxy - (y — mx ) 2 + y 2 

Only the first brackets contain 6, and again we put this quantity equal to 
zero This gives us 


l b — (y — mx) — 0 


( 8 - 8 - 6 ) 
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If we solve this for b and substitute the result in equation 8-8-5, we can 
then solve the resulting equation for the desired value of m The result is 


The denominator is equal to o~ x by equation 4 - 4 -1, and it is customary to 
write the equation for m m the form 


m 


xy_ - xy 


(8-8-7) 


If all the values of x and y are very large, it is convenient to use another 
form of this equation, m terms of the deviations from then means 


_ (s - s)(y - y ) 

IlO — 2 

O'* 


( 8 - 8 - 8 ) 


You can readily verify the fact that this is identical to 8-8-7 by multiplying 
out the numerator m the right-hand side, applying equations 3-10-2, 3-10-4, 
and 3-10-5, and collecting terms 

As soon as we have obtained m from either of these equations, we can 
obtain b most quickly by using equation 8-8-6 m the form 


b = y - mx (8-8-9) 

These values of m and b can now be inserted in the equation y v = mx + b 
The resulting equation is then the one which will make the sum of the 
squares of the deviations an absolute minimum and which has therefore 
the highest possible probability of expressing exactly the relation between 
x and the predictable part of y, as distinguished from the random part 
An alternative form for the equation of the line of best fit is obtained 
if we substitute b from 8-8-9 directly into y v = mx + b, in which case we 
obtain 

Vv = V + — v) ( 8 - 8 - 10 ) 

To demonstrate the procedure of using these equations, let us again 
use the data m the first two columns of Table 8-7-1 The work can be 


Table 8-8-1 Least Squares Procedure 



X 

y 

xy 

x 2 


10 

5 

50 

100 


20 

15 

300 

400 


30 

15 

450 

900 


40 

20 

800 

1600 


50 

35 

1750 

2500 

Sum 

150 

90 

3350 

5500 

Mean 

30 

18 

670 

1100 


(x) 

( y ) 

(xy) 

(s 2 ) 
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organized as shown m Table 8-8-1 From the way m which they are 
computed m these operations, we see that 30 is x, 18 is y, 670 is xy, and 
1100 is x 2 Using equation 4-4-1, we find that a 2 x = 200. Inserting these 
values m equation 8-8-7, we have 


m = 


670 - 30 X 18 
200 


0 65 


Inserting this value m 8-8-9, we obtain. 

6 = 18 — (0 65) (30) = -1 5 

The equation of the least squaies line is then obtained by placing these 
values in 8-8-1 

y p = 0 65x - 1 5 (8-8-11) 

which is the required line The sum of the squares of the deviations from 
this line should be less than those from any othei possible line If we 
apply the least squares test as demonstrated m Table 8-7-1, we obtain 57 5 
for the sum of the squaies of the deviations, and we know from the theory 
outlined above that no other straight line can produce a smaller sum of 
the squares of the deviations, and thus no other straight line has a larger 
piobability of being correct. 

PROBLEMS 

1 Write a mathematical prool of equation 8-8-8, using equation 8-8-7 as a 
starting point 

2 Fit a least squares line to the data m Problem 2 of the preceding section 
Test this line by the criterion of least squares and compare your results with those 
obtained m Problem 3 in that section 

3 Fit a least squares line to the relationship between age (x) and height (y) 
m Table 3-9-1. 

4 Write a complete derivation of equation 8-8-7, supplying and explaining all 
the missing steps 

5 Apply the least squares test to equation 8-8-11 Compare your result with 
the results m Table 8-7-1 and explain its significance 


9. SIMPLE CURVILINEAR CURVE FITTING 

In any curve fitting operation, there is an unavoidable tendency to 
simplify the relationship, since the scatter of the individual points 
fiequently obscures the finer details of the relationship and leveals only 
the general trend For this reason a straight line fit is appropriate for a 
veiy large percentage of statistical problems If, however, the points 
show an unmistakable curvature, then an attempt should be made to 
express the relationship m the form 

Y = ml + b 

where Y stands for any mathematical expression depending upon y alone 
and X is one depending upon x alone In some cases the form of the ex- 
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pressions X and Y is given by independent knowledge of restrictions which 
the variates are known to fulfil, but generally they are chosen by the in- 
vestigator by trial and error In the latter case the investigator should 
begin by trying the simplest possible expressions for X and Y , and should 
proceed to more complex forms only if the simple ones fail Some sug- 
gested forms for X are x 2 , Vs, x 3 , log x, l/x, and so forth 



Figure 8-9-1. Population of the United States. 

For an example of a situation in which the form of the curve is suggested 
by the nature of the data, let us consider Table 8-9-1, which gives the 
population of the continental United States from 1790 to 1840 To avoid 
the use of large numbers m the equations we have let x stand for the year, 
measured from 1825 If we now plot y against x (Figure 8-9-1), we see 
that the points lie along a curve which becomes steeper as x increases 
and that no straight line can be made to fit the data very well But we 


Table 8-9-1 Population of Umted States* 


Year 

i 

Pop (j/) 

Log Pop 

1790 

-35 

3,929,214 

6 5943 

1800 

-25 

5,308,483 

6 7249 

1810 

-15 

7,239,881 

6 8597 

1820 

-5 

9,638,453 

6 9840 

1830 

+5 

12,866,020 

7 1096 

1840 

+15 

17,069,453 

7 2322 

1850 

+25 

23,191,876 

7 3653 

1860 

-{-35 

31,443,321 

7 4975 


♦Reprinted from Encyclopaedia Bntanmca, 1941, Volume 22, page 732, by per- 
mission 




148 


INTRODUCTION TO THE THEORY OF STATISTICS 


[CH. 8 


saw in Article 6 of Chapter 3 that it is reasonable to expect populations to 
increase m accordance with a different law, namely, that the logarithm of 
the population should increase uniformly with time Using this law, we 
would expect the relationship between x and y to be given by an equation 
of the form 

log y — mx + b (8-9-1) 

This equation leads us to expect that if we plot x against log y we should 
obtain approximately a straight lme To test this hypothesis, we tabulate 
the logs of each value of y (column 3 of Table 8-9-1) and plot them against 
x (Figure 8-9-2) Since the points now fall almost along a straight lme, 



we regard the hypothesis as acceptable and proceed to fit an exact least 
squares lme to the numbers m the second and fourth columns of Table 
8-9-1, using the methods which we have already studied The result is 

7 = 7 0459 + 0 01282X 

where Y is log y and X is simply x In terms of the original variables, this 
is 

log y = 7 0459 + 0 01282a: 

which is the required curve Using this equation, we can now compute a 
few predicted values of y , plot them, and draw a smooth curve for com- 
parison with the original data Such a curve has been drawn in Figure 
8-9-1 

PROBLEMS 

1 Fit a curve of the form 8-9-1 to the data in Table 8-9-1, and verify the results 
given m the accompanying discussion 
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2 In the following table, x is the distance in miles of each of eight communities 
from a large city and y is the per capita amount which residents of each community 
spend m the city each month 


X 

y 

10 

$94 

28 

35 

33 

31 

57 

20 

61 

16 

64 

18 

87 

9 

92 

12 


Assuming that the relationship between these variables is of the form y = m( 1/x) 
+ b, fit the best possible curve to this data 

10. SUMMARY 

Curve fitting is the operation of finding what the relationship is 
between two variables, while correlation theory is a study of the strength 
of that lelationship The investigator who works on one of these two topics 
is usually interested m the other as well, and there is considerable economy 
of operation in undertaking them simultaneously. The following su mm ary 
of operations should be used, therefore, only if you wish to fit a curve to 
a set of data but do not wish to measure the correlation If you wish to 
measure the correlation as well , then you should use instead the summary of 
operations at the end of the following chapter 

There are two procedures which can be used to obtain the lme of best 
fit The first is approximate, the second is exact 

I APPROXIMATE PROCEDURE 

1 Graph the data 

2 If the points on the graph are numerous and scattered, replace each 
pair of them by a more conspicuous point halfway between them Repeat 
if necessary In any case, be certain that the final points are large and 
conspicuous, and that they form a visual pattern which stands out from 
the rest of your diagram 

3 If the pattern of points forms an unmistakable curve which cannot 
be reconciled with any straight lme, then the methods of part III of this 
summary should be used Before reaching this conclusion, however, you 
should make certain that the deviation from a straight hne is established 
by a number of points, and not by an isolated one or two Remember that 
you are trying to separate the systematic or predictable part of the varia- 
tion from the random part, and if you draw a curve which bends to reach 
a single point, you may have mistakenly included a random variation in 
your estimate of what is predictable Your subsequent predictions will 
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then erroneously ascribe this particular random deviation to all other 
cases having a similar value of x In most practical cases of curve fitting 
a straight line is adequate, and the cases genuinely requiring curvilinear 
equations are rare 

4 If you decide that the points are xepresentable by a straight line, 
draw a best fitting lme by inspection Remember that it is generally bad 
to balance one large deviation against several small ones, unless you 
estimate their squares and adjust the line so that the sum of the squares 
of the deviations is minimized If the points are numerous and well dis- 

' tnbuted, a simple visual bisection of the pattern of points is usually 
sufficient 

5 Extend the lme to cut the y axis, and read off the value of y at this 
pomt (Remember that y is negative if the lme crosses below the origin ) 
Call this value b 

6 Select any two points on the lme (not two of the ongmal plotted 
pomts) and label them “1” and “2 ” Then compute m from equation 
8-4-1 


x 2 ~ x t 

7 Insert m and b m the equation y v — mx + 6 This is the equation of 
the line of best fit, and may be used for predicting values of y to be expected 
foi any value of x. 


II. EXACT PROCEDURE 

1 List the data as in Table 8-8-1, and compute xy and x 2 for each entry 
Add all columns, and then divide each of the resulting sums by the number 
of observations Label the results x, y , xy, and x 2 as shown at the bottom of 
the columns m Table 8-8-1 

2 Compute <t 2 x fiom al = x 2 — x 2 . 

3 Compute m from equation 8-8-7 

4 Compute b from equation 8-8-9: 

b — y — mx 

5 Insert these m y = mx + b This is the equation of the line of best 
fit. A complete demonstration is found m Table 8-8-1 and the discussion 
following it. 

Ill SIMPLE CURVILINEAR CURVE FITTING 

1 Plot the points If the resulting graph displays a well-established 
curvatuie, which cannot be explained on the hypothesis of random devia- 
tions from a straight lme, then the following procedure is applicable 
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2 Select the type of curve to be fitted This may be given by the con- 
ditions of the problem, or it may be selected as a result of trial and error 
on the part of the investigator For the methods here described to be 
applicable, the equation must be of the form Y = ml + b , where Y 
depends upon y alone and X upon x alone. Some typical trial values for 
X are x 2 , x s , \/x, log z, 1/z, and so forth 

3 Compute X and Y for each entry in the table of data 

4 Fit a straight line to the variables X and Y, using either Method 
I or Method II 

5 In the resultmg straight line equation, replace X and Y by their 
values m terms of x and y This is the desired equation The procedure 
is illustrated m Article 9 



CHAPTER 

. 9 . 

SIMPLE CORRELATION 


1. INTRODUCTION 

In the preceding chapter, we studied the methods by which a lme of 
38t fit may be found, and the methods by which it can be used for esti- 
mating or predicting the value of one variable when the corresponding 
due of the other is known In order to make the best possible use of 
Lch a prediction, it as necessary that we know something about its re- 
ability or its limits of trustworthiness Without such information, the 
'ediction would be of little use as a basis for a practical decision. For 
ns purpose we need a measure of the degree or strength of the dependence 
' one variable upon another If the strength of the relationship is very 
mted, that is, if the two variables are nearly independent, then any 
’ediction of one from the other would be of little value, while if they are 
osely related, the prediction can be made with a high accuracy 
In a general way, the strength of the relationship is at once apparent 
om a visual inspection of a graph of the two variables In Figure 9-1-1, 
r example, the number of chirps per minute for 115 crickets has been 
otted against the temperature m degrees Fahrenheit We see that all 
te pomts fall almost exactly upon a straight line, and we draw the rather 
uprising conclusion that there is a very close relationship between the 
equency of cricket chirps and the temperature It would be possible to 
trmate the temperature with high precision by countmg the chnps of 
ickets, or, conversely, to predict the frequency of cricket chirps very 
icurately if we know the temperature 

In Figure 9-1-2, we see the results of graphmg the estimated bram 
eight of members of the United States Senate agamst their legislative 
3ility, as estimated by a complicated scoring system from their legisla- 
ve records The pomts scatter so widely that it is almost impossible to 
less where the line of best fit should be It is evident that there is practi- 
illy no relationship between bram weight and legislative ability, and that 
we estimated legislative ability from bram weight, our results would be 
) uncertain as to be of extremely little use It would clearly be very 
nwise to choose senators on the basis of measurements of their bram 
eights 
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Figure 9-1-1. Temperature and Cricket Chirps (Based upon data by 
Bert E Holmes. Reprinted by permission of Prentice-Hall, Inc, from 
"Applied General Statistics" by Croxton and Cowden. Copyright 1939 by 
Prentice-Hall, Inc. See also "Vocal Thermometers," "Scientific Monthly," 
September, 1 927 ) 

In the above examples, we can see by an inspection of the graphs that 
the degree of relationship, or correlation, between chirps and temperature 
is much greater than the correlation between legislative ability and brain 
weight Why then do we need a more exact measurement of “degree of 
relationship,” or “ correlation”? To answer this question let us examme 
the following situations 

I A school screens applicants for admission on the basis of an entrance 
examination A graph of examination scores plotted against later per- 
formance m school shows a large scatter, but an obvious relationship 
The entrance officials are examining a new test, which, they think, might 
make more accurate predictions possible and permit a selection of better 
candidates A graph of the results of the new test, however, also shows a 
considerable scatter, and it is impossible to tell, from a visual mspection 
of the graphs, which is better Some kind of exact numerical measure- 
ment of the correlation is needed in order to choose the better test 

II A scientist has shown that the rate of tree growth is, on the average, 
somewhat more rapid during years when there are many spots on the 
suiface of the sun, and slow m years when there are few spots He has, 
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Figure 9-1-2. Ability and Brain Weight (Based upon data by Arthur 
MacDonald Reprinted by permission of Prentice-Hall, Inc , from "Applied 
General Statistics" by Croxton and Cowden Copyright 1939 by Prentice- 
Hall, Inc.) 



in other words, discovered a correlation between sunspot abundance and 
rate of three growth In publishing his results he can save time and words 
if it is possible to express the amount of the correlation by giving a single 
number which will be immediately and precisely understood by other 
scientists 

III It is evident from the scientist’s results that the rate of growth of 
tiees is not wholly controlled by spot abundance, that is, that spot abund- 
ance does not constitute 100 per cent of the cause of changes in rate of 
growth But what percentage does it constitute? Has he explained only 
10 per cent of the total cause or causes, leaving 90 per cent unexplained, or 
has he succeeded m pinning down as much as 30 per cent of the causation 
of variation m tiee growth ? 

From the above examples, it is seen that we have two objectives First, 
we wish to describe a standard numerical measure of the degree of correla- 
tion, and secondly, we wish to relate this, so far as possible, to the some- 
what elusive concept of “percentage causation,” or “percentage of related 
variation ” 
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PROBLEMS 

1 Using the data in Table 3-9-1, plot height against age, and then plot I Q 
against age In which case do you think that there is the greater correlation? 
Retain your answers for comparison vith later results 

2 Collect a set of data m which you think that some correlation might con- 
ceivably exist and make a graph of your results Your data will be most useful 
for further exercises if you collect three pieces of information about each individual 
The possibilities for collecting such data are almost unlimited, but if you find a 
choice difficult, the following suggestions are offered 

(a) Height, weight, and age of college students 

(b) Grades, hours of study, and IQ’s of college students 

(c) Grades, hours of study, and number of dates per week 

(d) Length, width, and number of veins m a set of leaves selected at random 
from a plant 

(e) Alcohol consumption per year, cigarettes smoked per year, and any index 
of general health such as number of days of illness last year 

(f ) Number of aces m a bridge hand, number of cards m the longest suit, number 
of tricks won by the hand 

(g) Grades, number of hours of employment per week, pay per hour 

(h) Shoe size, length of span of right hand, length of span of left hand 

2. COEFFICIENT OF DETERMINATION 

Our objective is to find a measure for the degree to which x and y are 
related or dependent, and the degree to which they are unrelated or in- 
dependent For this purpose, let us begm by dividing each value of y 
mto two parts, one of which is completely dependent upon x, and the other 
of which is completely independent of x We can then expect to measure 
relatedness by some kind of comparison between the sizes of these two 
parts We have already discussed such a separation of the components 
of y m Article 6 of Chapter 8, where we mtroduced the notation y v , or 
u y predicted,” for the part of y which is related to x, and which is exactly 
predictable if we know the corresponding value of x 

In some cases this mathematical separation of y into components 
coi responds to an obvious physical separation into related and unrelated 
parts For example, if x is the measured radius and y the measured 
circumference of each of a set of circles, and if we have measured them 
to the nearest tenth of a millimeter, then the separation into components 
is obvious y v is the true circumference and is exactly related to x by the 
equation y v = 2tt x, and y — y v is a small error of measurement, positive 
or negative, which is completely independent of x We will see, however, 
that the process of separatmg the components can be carried out whether 
the investigator sees such a physical separation or not The mathematical 
analysis shows him how large the two components are, and he can then, 
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with this information, construct a hypothesis to explain their relative 
sizes. 

For the sake of the student who is not accustomed to following a dis- 
cussion involving a number of abstract symbols, let us examme a specific 
problem as we proceed Five students compared notes on the length of 
time each spent m study for an examination and discovered the following 
data: 



X 

y 

Student 

Hours of 

Score on 


Study 

Exam 

Jackson 

5 

26 

Petoskey 

4 

17 

Goldberg 

3 

17 

Bellini 

2 

11 

Schwartz 

1 

14 


If we fit a straight line to these data by the method of least squares 
(Article 8, Chapter 8), we obtain 

y v = 3x + 8 

This is the best possible equation for predicting the score which other 
students will make on the examination if we know only how long they 
have studied for it If we test the equation on the five men above to see 
how well it would have predicted their scores, we find the following 


Student 

X 

y 

Vv 

V — Vv 

Jackson 

5 

26 

23 

3 

Petoskey 

4 

17 

20 

-3 

Goldberg 

3 

17 

17 

0 

Bellmi 

2 

11 

14 

— 3 

Schwartz 

1 

14 

11 

3 


Thus we see that our hypothesis about the separability of y leads us to the 
conclusion that of the 26 points Jackson scored, 23 were predictable from 
a knowledge of the length of time which he studied for the exammation, 
and 3 were unrelated to time of study and therefore not predictable from it 
At first glance it might appear that the fraction 23/26 might be used as 
a measure of the success of the equation y v — 2>x + 8 in predicting Jackson's 
grade A little reflection, however, will show that it would be foolish to 
compare the related part, y P , directly with the original value of y The 
numerical size of y depends upon such extraneous factors as the zero point 
chosen by the investigator m tabulating his results One investigator 
might convert the letter grades A, B, C, D, and F into numbers by as- 
signing the values 1, 2, 3, 4, and 5 to them; another might use the numbers 
0, 1, 2, 3, and 4; still another might record only the number of mistakes 
which the examinee made, m which case the best grades would be those 
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with the smallest values of y It is clearly not very revealing to ask 
“What fraction of y does the equation succeed in predicting?” and we 
should ask instead “What fraction of the deviation of y from its mean does 
the equation succeed in predicting?” 

If we rewrite our illustrative problem in terms of deviations from the 
mean, we will have the results shown m Table 9-2-1 


Table 9-2-1 Components of the Deviations 


Student 

y 

Vv 

y- y 

(Tot dev.) 

Vv — v 
(Expl. dev ) 

y - y v 

(Unexpl dev.) 

Jackson 

26 

23 

9 

6 

3 

Petoskey 

17 

20 

0 

3 

-3 

Goldberg 

17 

17 

0 

0 

0 • 

Bellini 

11 

14 

— 6 

-3 

—3 

Schwartz 

14 

11 

-3 

-6 

3 


The average of all the y ’ s is 17, and Jackson's score deviates from this 
average by 9, which we call the total deviation His predicted score is 23, 
which deviates from the mean by 6, and we will call this the explained 
deviation This is the part of the total deviation which is predicted exactly 
by the equation, and is therefore totally dependent upon x The remaining 



part of the total deviation, y — y p , or 3 m the case of Jackson, is the residual 
amount which cannot be predicted from a knowledge of x, and is totally 
independent of x We will call this part the unexplained deviation The 
relations between these quantities are shown in Figure 9-2-1 

We are seeking some way of averaging each of these three sets of devia- 
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tions so that we can define the “percentage depen dence” and the “per- 
centage independence” in some such way as follows 


Percentage dependence = 


Percentage mdependence = 


Average exp lamed deviation \ 
Average total deviation f 

Average unexplained deviation 1 


(9-2-2) 


Average total deviation 


/ 


To make these definitions complete, it remains only to specify what kmd 
of averages we should use At first glance it might appear that we are 
free to choose any kmd of average which might be convenient, but a little 
further reflection will show that we are not quite tree If we are to interpret 
the results as percentages of dependence and independence, respectively, 
then an obvious property which they must possess is that they must total 100 
per cent , and we must choose a method of averaging which will brmg this 
about If, for example, y is 80 per cent dependent upon x , then it must 
necessarily be 20 per cent independent of x If this restriction were not 
present we might use the average of the absolute values of the deviations, 
or the root mean squares of the deviations, or their logarithmic means, 
or the means of their cubes, or any other kind of average which might 
appear useful We could use any of these m equations 9-2-1 to give us a 
kmd of measure of relatedness, but we can interpret the results as per- 
centage dependence and percentage independence only if the sum of the two 
percentages equals 1 We will now prove that this will be true if we use 
one specific kmd of average, namely the average of the squares of the 
deviations, or the variance 

To prove this important fact, we must express m comparable terms the 
averages of the squares of each of the three kinds of deviations m equations 
9-2-1 Let us begin with the average of the squares of the explained devia- 
tions We shall call this average the explained variance 

ExpL Yar = (y p - y) 2 = [(mx + b) - y] 2 

where we have substituted for y v its value, mx + b, from equation 8-8-1 
Now let us substitute for b its value, y — mx , given by equation 8-8-9 

Expl Var = ( mx + y — mx — y) 2 = m(x — x) 2 = m(x — x) 2 

We recognize (x — x) 2 as at from equation 4-3-2, and the above equation 
becomes 

Expl Var = mat (9-2-2) 

If we proceed similarly with the unexplained deviations, y — y p , we will 
obtain the unexplained variance 

Unexpl Var = (y - y p ) 2 = \y - (mx + b)f (9-2-3) 

= [y — mx — y + mx] 2 


[{y “ y) — m(x — x)] 
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If we square out the right-hand side and express the results as three separate 
means, we will have 

Unexpl Var = (y - yf - 2 m(y ~ y)(x - x) + m\x - x) 2 

The terms (y — y) 2 and (x — x) 2 are the squares of <r v and <r x , respectively, 
from equation 4-3-2 The term ( y — y)(x — x) is the numerator in the 
formula for m (equation 8-8-8), and is therefore equal to rrurl Making 
these substitutions, we have 

Unexpl Var = a u - 2 m(m<rl) + mV* = - mV* (9-2-4) „ 


Equations 9-2-2 and 9-2-4 give us expressions for the explamed variance 
and the unexplained variance, and we need m addition only the total 
variance , or the average of the squares of the total deviations This we 
recognize immediately from equation 4-3-2 as the square of the standard 
deviation of y 

Total Var = (y — y) 2 — <r 2 v (9-2-5) 

Usmg these variances for the ratios m equations 9-2-1, we obtain: 


Explamed variance _ ma x 
Total variance <xl 


Unexplamed variance 
Total variance 


2 2 2 
cr v — m <r x 
2 

0 ^ 


(9-2-6) 

(9-2-7) 


If we add these two, we have 

Expl Var Unexpl Var 

Tot Var Tot Var 


2 2 
m <r x 


1 2 2 2 
<Ty — m, or x 

2 
< Tu 


= 1 (9-2-8) 


Thus we have shown that if we use ratios of variances in equation 9-2-1, 
we can interpret the results as percentages of dependence and independ- 
ence. We therefore adopt, as a measure of the percentage of the variation 
in y which is related to variation m x, the explained variance divided by the 
total variance This quantity is called the coefficient of detei mmation , or D 
The unexplained variance divided by the total variance is called the coefficient 
of alienation , or A , and is obviously equal to 1 — D It is the percentage 
of variation in y which is mdependent of variation in a; A summary of 
these relations is as follows 


n - ~ yf. = Expl. Var 

(y — yY Tot Var 

A - 1 _ n _ (y ~ yJL - Unexpl Var 
A ~ 1 U ~ u, ~ Tot. Var 


(9-2-9) 


( 9 - 2 - 10 ) 
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PROBLEMS 


Fill in the following spaces 


Problem 

Expl Var 

Unexpl Var. 

Tot. Var. 

D 

A 

1 

22 

28 




2 

10 


90 



3 


5 


0 90 


4 



20 

0 75 


5 


20 



0 40 

6 

14 




0 30 


3. COEFFICIENT OF CORRELATION 

Most statisticians do not compute the coefficient of determination 
directly, but compute mstead a quantity called the coefficient of correlation , 
(r ) J7 which is the square root of D Sometimes this coefficient of correlation 
is then squared to obtain D, but more often the results are reported by 
giving the value of r alone, and it is left to the reader to compute D 

For reasons of mathematical convenience, r is given a plus or minus 
sign to agree with the sign of m In other words, r is assigned a positive 
value if y increases when x increases, and a negative value if y decreases as 
x increases A summary of these relations, obtained by combining the 
definition of r with the information m equations 9-2-9 and 9-2-6, is as 
follows 

r = Coef of Corr = ± VD = d=\/l P^ = m ~ (9-3-1) 

\ Tot var <r v J 

Equations 9-2-9 and 9-3-1 define D and r, but are not very useful for 
computing them More rapid computational formulas will be developed 
in later Articles 

PROBLEMS 

1 to 6. Compute r for each of the problems at the end of Article 2 

4. COMPARISON BETWEEN r AND D 

The student will perhaps feel that it is wasteful and unnecessary to 
master two separate statistical terms which measure, in different ways, 
exactly the same thing, namely, closeness of relationship In order to see 
why both are m use by statisticians, and why both are useful, let us examine 
briefly the merits of both m a specific problem 

Suppose that m a given situation y is dependent partly upon x, and 
partly also upon another variable z which has also been measured by the 
investigator, and suppose also that x is totally independent of 2 We now 
have several possible coefficients of determination between various pairings 
of x , y , and 2 , let us distinguish them by subscripts Let us suppose that 
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D xy is 0 3 and D zy is 0 5 Since z and x are independent, D xz is zero Now, 
in view of the meaning of the coefficient of determination, we can assert 
that 30 per cent of the total variance m y is predictable from changes m 
x , and that 50 per cent is predictable from changes m z. Since x is inde- 
pendent of z ) the 30 per cent does not overlap or duplicate any of the 50 
per cent, and we can deduce that 80 per cent of the total variance m y is 
predictable from variation m x and 2 , leaving only 20 per cent of it un- 
explained In other words, 20 per cent of the total variation in y is related 
to neither x nor z and is presumably caused by other factors not measured 
by the investigator. 

This additive 'property is not possessed by r In this example r xv is the 
square root of D xv , or 0 55, and r yz is the square root of D yz , or 0.71. The 
sum of these two does not measure the total correlation m any way. 

We can state this additive property in another way. If y is known to 
be completely determmed by several variables, x } z, u, and v, which are all 
independent of each other, then 


Dyx + Dyz + D yu + D yv = 1 (9-4-1) 

There is another reason for considering r a somewhat clumsy mathe- 
matical device The nature of the relationship between x and y is described 
by the line of best fit, and the strength of the relationship is described by D. 
But r is a composite quantity which describes the strength of the relation- 
ship by its absolute value, and part, but not all, of the nature of the re- 
lationship by its plus or minus sign 

It is suggested that you legard r as an intermediate mathematical step 
and D as the final objective In short, r should be studied because it is 
widely used m statistical reports and because it is a useful mathematical 
tool for a variety of purposes; but it is recommended that when you read 
a report containing a value of r, you should mentally square it to obtain 
D for the purpose of interpreting the results 

It is sometimes useful to think of D somewhat loosely as the “per- 
centage causation, ” although it is important to notice that we know 
nothing about the nature of the causation from the size of D or r Varia- 
tions m x may be causing variations m y, or vice versa, or both may be 
caused by variations m a third variable which was not measured by the 
investigator, or x and y may have varied together by chance The in- 
terpretation of r and D will be discussed at a later point. 

PROBLEMS 

1 In a given agricultural area, it was found that the coefficient of correlation 
(r) between the amount of spring rainfall and the size of the annual hay crop is 
0 80 What fraction of the variance in annual hay crop is caused by or related to 
spring rainfall ? 
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2 What is the largest possible coefficient of correlation between size of hay 
crop and any variable which is completely independent of spring rainfall, such as 
the amount of fertilizer applied by the farmer? 

3 Would a larger value of r than this be possible between hay crop and average 
temperature? Explain 

5. NUMERICAL COMPARISON OF VARIANCES 

The additive property of the variances, proved m the preceding article 
and described in equation 9-2-8, is a very important one It is well worth 
„ while to follow our numerical example through a further step m order to 
demonstrate this property, and to show that it is not possessed by other 
kinds of averages of the components of the deviations 

In order to compute the variances we must square all the ’deviations m 
Table 9-2-1 The complete computations are shown m Table 9-5-1 The 


Table 9-5-1 Comparison of Variances 


Student 

X 

y 

V» 

y- Hv- y? 

Vv - y 

G/j> - yf 

y - y* 

(: V ~ 2/p) 2 

Jackson 

5 

26 

23 

9 

81 

6 

36 

3 

9 

Petoskey 

4 

17 

20 

0 

0 

3 

9 

-3 

9 

Goldberg 

3 

17 

17 

0 

0 

0 

0 

0 

0 

Bellini 

2 

11 

14 

-6 

36 

— 3 

9 

-3 

9 

Schwartz 

1 

14 

11 

— 3 

9 

— 6 

36 

3 

9 

Sum 

15 

85 



126 


90 


36 

Mean 

3 

17 



25 2 


18 


7 2 






(Tot 


(Expl 


(Unexpl 






Var) 


Yar) 


Var) 


variance of the observed y’s is 25 2 This is the total variance , and we wish 
to account for as much of it as possible in our predictmg equation The 
variance of the predicted values is 18 0, this is the variance which we have 
succeeded in accounting for by the equation y = 3x + 8, it is, m other 
words, the explained variance The vanance of the remaimng deviations 
after the explained portions have been removed is 7 2, this is the un- 
explained variance We see that the sum of the explained variance and 
the unexplamed variance is the total variance, verifying the important 
principle proved in Article 2 Notice that this property is not possessed 
by any other kind of average of the deviations If, for example, we form 
the mean of the absolute values of the deviations, we have. 

Mean total deviation =36 

Mean explained deviation =36 
Mean unexplained deviation =24 

Thus this method of averaging will not give us additive averages 
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We can now compute the coefficients of determination, alienation, and 
correlation from these variances The equation succeeds in explaining 
the fraction 18/25 2, or 71 per cent, of the total variance, thus the co- 
efficient of determination is 0.71 The remammg fraction, 7.2/25 2, or 
29 per cent of the total variance is left unexplamed, therefore the co- 
efficient of alienation is 0 29 The coefficient of correlation, r, is the 
square root of 0 71, which is 0 85 We interpret these results as follows 
The score which a student makes on the examination is 71 per cent con- 
trolled by or related to the length of time which he studies for it, and 
29 per cent controlled by other factors, such as his native ability or his 
knowledge prior to studying 

It is important to notice that coefficients of correlation and of determina- 
tion do not contain any information about the nature of the causation. In 
the illustrative problem, we know only that it is possible to predict 71 
per cent of the variance m y from the corresponding values of x We do 
not know whether changes m x cause changes in y, or whether it is the 
changes m y which cause the changes in x, or whether both are partially 
caused by other variables Any hypothesis about the cause of the relation- 
ship between x and y must be made from whatever additional knowledge 
the statistician may have about the specific situation, it cannot be deduced 
from the size of D or r This is a fertile source of error in interpreting 
statistical analysis, and it should be emphasized that the statistician must 
be familiar with the data and its sources as well as the statistical methods 
of treatment There is no substitute for common sense m interpreting 
coefficients of correlation. 

It is also important to note that the value of D measures only the 
relationship which we have been able to incorporate into our straight-lme 
equation The true degree of relationship may be greater than this, but 
if the relationship is curvilinear, for example, it cannot be encompassed 
m any straight line equation and so will be partly lost 

6. BASIC COMPUTATIONAL FORMULA FOR r 

The definition of r given m Article 3 is completely general and can be 
used later to define a coefficient of “multiple correlation” m situations 
where several related variables are used simultaneously for prediction, 
and also for a “coefficient of non-linear correlation” m situations m which 
a curve is fitted to the data instead of a straight lme Like many funda- 
mental definitions, it is not very useful for computing the thing it defines, 
and our next task will be to set up various formulas for rapid computation 
of r 

To obtain a basic computational formula, let us recall that m is equal 
to (xy — xy) /crl from equation 8-8-7, and let us substitute this into equa- 
tion 9-3-1. 

<x £ xy — xy <r x xy-xy 
m — — 3 = 

O’ ,, (T x & a & x & y 


r 


(9-6-1) 



164 


INTRODUCTION TO THE THEORY OF STATISTICS 


[CH 9 


or, if we obtain m from equation 8-8-8 instead, 

T _ O - z)(y - .£ ) (9-6-2) 

cr x o- y 

In words, equation 9-6-1 tells us that the coefficient of correlation between 
x and y is equal to the average value of xy, minus the average value of x 
times the average value of y, all divided by the product of the standard 
deviations of x and y To see how this equation is used, let us apply it to 
the illustrative problem m Article 2_ _ 

We begin by computing x, y , xy , x 2 , and y 2 , by means of the computa- 


Table 9-6-1 Computation of r 



X 

y 

xy 

2 

X 

2 

y 


5 

26 

130 

25 

676 


4 

17 

68 

16 

289 


3 

17 

51 

9 

289 


2 

11 

22 

4 

121 


1 

14 

14 

1 

196 

Sum 

15 

85 

285 

55 

1571 

Mean 

3 

17 

57 

11 

314 2 


tions shown in Table 9-6-1 We then compute <r x and <r v from equation 
4-4-1, and insert these m 9-6-2 

r= 57-3X17 . 

V(ll - 3 2 )V(314 2 - 17 2 ) 

If x and y are simple numbers, we can som etimes use equa t ion 9-6-2 
instead In this case we begin by computing ( x — x)(y — y), {x — xf, 


Table 9-6-2 Alternative Procedure for r 



X 

y 

x — X 

y — v 

(x — xf 

(.y - y ? 

1 

1 


5 

26 

2 

9 

4 

81 

18 


4 

17 

1 

0 

1 

0 

0 


3 

17 

0 

0 

0 

0 

0 


2 

11 

-1 

— 6 

1 

36 

6 


1 

14 

-2 

-3 

4 

9 

6 

Sum 

15 

85 



10 

126 

30 

Mean 

3 

17 



2 

25 2 

6 
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and (y — y) 2 as shown in Table 9-6-2 We next obtain a z and <r„ from 
4-3-2, and have for r, 


6 

V2\/25~2 


0 85 


In general, equation 9-6-1 produces faster results than equation 9-6-2 


PROBLEMS 

1 Write a mathematical proof that equations 9-6-1 and 9-6-2 are equivalent 

2 ^Compute the coefficient of correlation between the ages and the intelligence 
test points for the five children described m Problem 2, Chapter 8, Article 7 
Compute D and A 

3 ^Compute r, D, and A for the relationship between age and height of the 
five children described in Table 3-9-1 (Use equation 9-6-2 ) 


7. REGRESSION EQUATIONS 

A statistician sometimes computes a value of r for its own sake and does 
not wish to use the prediction equations Sometimes he is interested only 
m a prediction equation and does not care about knowing the value of r. 
More frequently, however, he wishes to measure the degree of correlation 
and also to set up an equation for estimating or predicting one of the 
variables In this latter case, some simplification can be achieved by 
combining the two procedures 

Suppose that we have begun by computing the value of r from equation 
9-6-1 or 9-6-2 The prediction equation, from equation 8-8-10, is. 

y P = V + nt(x - x) 

Equation 9-3-1 tells us that r = m<r x /<? v If we solve this for m and sub- 
stitute the result in the above equation, we have 

Vv = y + — r(x - x) (9-7-1) 

If we wish to predict or estimate x from a known value of y, the roles of 
the two variables are reversed, and the prediction equation ** is 

x v = x + — i(y — y) (9-7-2) 

These prediction equations are frequently called the “regression” equa- 


*Retam your computations for future use 

**It is interesting to note that these equations become extremely simple if we express 
x and y in t units Equation 9-7-1 becomes (t y ) P — rtx . 9-7-2 becomes (t x ) P = rt v , and 
9-6-2 becomes r = txt v 
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tions, and the constants m them are called coefficients of regression The 
following terminology is often used: 

Coefficient of regression of y on x = b yx = ~r (9-7-3) 

&x 

Coefficient of regression of x on y = b xy = — r (9-7-4) 

The regression equations are fiequently written m terms of b vx and b xy 

y P = y + b vx (x - x) (9-7-5) 

and x P = x + b xv (y - y) (9-7-6) 

When plotted, the line 9-7-3 (or 9-7-5) is called the line of regression of 
y on x, and the lme 9-7-4 (or 9-7-6) is called the line of regression of x on y 

PROBLEMS* 

1 Write the regression equation for predicting the height which a child will 
reach at a given age, using the data m Table 3-9-1 

2 Using this result, estimate the height which a child of 13 should have 

3 Write the regression equation for estimating the age of a child when its 
height is known 

4 If a child is 47 inches tall, what is his probable age? 

5 Write the regression equation for estimating the points which a child of a 
given age will score, usmg the data m Problem 2, Chapter 8, Article 7 

8. STANDARD ERROR OF ESTIMATE 

When a prediction has been made by means of the regression equations, 
the result is of little value unless something is known about its precision 
If an oil company wishes to send a crew into a jungle region known to have 
a serious health hazard, it might be useful to set up an equation for pre- 
dicting the rate of infection, based upon data concerning temperatures, 
humidity, insect population, and mfection rates m similar jungles and to 
know that this equation predicts an annual mfection rate of nine cases of 
the disease m question per thousand of inhabitants But perhaps the 
doctors, while able to handle a rate of 9 per thousand, would be utterly 
unable to cope with a rate of 50 per thousand, and would be unable to 
prevent the latter rate from snowballing into a disastrous epidemic It 
now becomes very important to know the probability that the predicted 
rate might turn out to be this far m error In this case, as m many others, 
it is fully as important to know the range of reliability of the prediction 
as it is to know the prediction itself 

If we assume that the errors of prediction, y — y P , will be distributed 
normally, we can solve all such problems by means of the normal curve 


*Retain your computations for future use 
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tables The standard deviation of these errors is called the standard error 
of estimate , and denoted by S v 

Sv = ^{y - yrf (9-8-1) 

Its use can be most readily explained by means of a demonstration 
Suppose that m the above example the doctors compute the standard 
error of estimate and find that it is equal to 7 We know then, from our 
knowledge of the normal curve, that the probability that the actual value 
will turn out to be within 7 of the predicted value is 0 68, m other words, 
the probability is 0 68 that the actual infection rate will be between 2 and 
16 per 1000 If we wish to know the probability that the true rate of 
infection will be as high as or higher than some other fixed number, say 20 
per thousand, we proceed as follows An occurrence of 20 would be a 
deviation of 11 from the expected value of 9, and if we divide this devia- 
tion by 7 we will convert it into t units Thus we find that t = 11/7,= 
1 57, and we find m Appendix V that for t = 1 57, the area is 0 4418 We 
conclude that the probability is 0 44 that the rate will be between 9 and 
20 To obtain the probability that the rate will be above 20 we must 
subtract 0 4418 from 0 5000 This probability is 0 0582, or roughly 6 
per cent To apply this method to our ongmal question, where the rate to 
be examined is 50 per 1000, we find t as before' 

t = (50 - 9)/7 - 5.9 

We look up t = 5 9 m Appendix V, and find that the residual area is less 
than 0 000 000 003, so that the probability that the rate will exceed 50 is 
negligible 

PROBLEMS 

1 In Article 2 we derived the equation y v — Zx -f- 8 for predicting the examina- 
tion score (; y ) from the hours of study (rc), based upon the data for five students 
Compute the predicted score for a student who works six hours 

2 What is the standard error of estimate of this prediction? (Use the data m 
Table 9-5-1) 

3. What is the probability that his actual score will be between 26 and 27 7? 
Above 27 7? Above 30? Below 15? 

9. FORMULA FOR STANDARD ERROR OF ESTIMATE 

The definition of S y given m the preceding article is, like many defining 
equations, not suitable for rapid computations To find a faster way to 
compute S v , let us begin by squaring equation 9-8-1. 

SI = (y- y v f 

From equation 9-2-3 we see that this is simply the unexplained variance, 
and from equation 9-2-4 we see that it can be written as follows 

Si = <rl - mV* 



168 INTRODUCTION TO THE THEORY OF STATISTICS [CH. 9 

From 9-3-1 we see that m = r(<r,/<r.) Making this substitution, we have: 

Si = d ~ = 4(1 -r 2 ) 

or, taking the square root: 

S v = <T u Vl - r 2 (9-9-1) 

This is the basic computational formula for S v To illustrate its use, let 
us apply it to the problem of grade prediction discussed m Articles 2 and 
5 of this chapter We have 

S u = V25 2Vl ~ 0 85 2 = 2 7 

This tells us that if we use the equation y P = 3# + 8 to predict the scores 
which other students will make on the examination, and tabulate the 
errors of prediction, we can expect the distribution of these errors to have 
a standard deviation of 2 7 Therefore, for any one prediction, the prob- 
ability is 0 68 that the true score will be within 2 7 of the predicted score. 

We can use this result m another way Equation 6-8-1 tells us that the 
probable error is 0 6745 times the standard deviation, and this gives us a 
probable error of 1 8 for the predicted score To see how this is used, let 
us predict the score of a student named Don Poller who studied two and 
one-half hours for the examination His predicted score is 15 5, and the 
probable error of the prediction is 1 8, that is, the chances are 50 per cent 
that his true score will be within 1 8 units of 15 5 This can be expressed 
m the customary way 

y v = 15 5 ± 1 8 

Equation 9-9-1 tells us that r must be fairly high if the regression equations 
are to be useful If we did not have the regression equations, the best 
guess as to the value of a y chosen at random would be y, m which case 
the standard deviation of the errors of estimation would be simply a v 
When we use the regression equation, the errors of estimation have a 
st andard deviation of S v , which is equal to <j v mul tiplied by the fraction 
a/ 1 — r 2 If, for instance, r is 0 4, then S v is — 0 4 2 times <r y , or 
92 per cent of <r v , and the size of the errors of prediction are reduced by 
only 8 per cent when we use the regression equations, as compared to the 
errors we would have made if we had assumed that every y was simply 
equal to y If r is 0 6, then the errors of estimate are reduced by 20 per 
cent, and if r is 0 8, they are reduced by 40 per cent The size of r is a 
deceptive guide to the usefulness of the relationship for predicting or 
estimating, and the standard error of estimate should always be computed 
If, on the other hand, we are interested m predicting the average value 
for a large number of individuals, then a smaller value of r can be used 
with good results This situation will be discussed further m Chapter 10. 
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Let us summarize the procedure for finding the coefficient of correlation, 
the prediction equation, and the standard error of estimate. 

First step Compute r from equation 9-6-1, or, if the values of x and y 
are large numbers and their range is small, from equation 9-6-2 

Second step Using this value of r, write the prediction equation for y 
from equation 9-7-1 

Third step Again using this value of r, compute the standard error of 
estimate from equation 9-9-1 

PROBLEMS 

1 Compute S for the estimate of height which you made m Problem 2, Article 7. 

2 What is the probability that the child’s true height will be within 4 inches 
of the predicted height? 

3 Compute S for the estimate of age which you made m Problem 4 of Article 7. 

4 What is the probability that the child will actually be over 10 years of age? 

10. PROCEDURE FOR LARGE VARIATES 

In many cases the investigator deals with values of x and y which are 
very large, and equation 9-6-1 is then too laborious to use In such cases 
the work can be simplified by using the alternative equation for r, 

r = (g - go Xy - Vo) - 0 - so) (y - y 0 ) (9-10-1) 

<T X <J y 

where x Q and y 0 stand for any convenient fixed numbers chosen by the 
investigator The proof of this equation is left to the student as an exer- 
cise The use of the equation is illustrated m Table 9-10-1, m which 12,300 

Table 9-10-1 Change of Zero Point 


X 

y 

x — x 0 

y - y o 

(x — ®o) 2 

(y — yof (x - 

- *o)(y — yo) 

12303 

2570 

3 

0 

9 

0 

0 

12318 

2591 

18 

21 

324 

441 

378 

12314 

2588 

14 

18 

196 

324 

252 

12305 

2571 

5 

1 

25 

1 

5 

12310 

2582 

10 

12 

100 

144 

120 


Sum 

50 

52 

654 

910 

755 


Mean 

10 

10 4 

130 8 

182 0 

151 0 



r 


5 55 X 8 59 


= 0 986 
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has been chosen for x 0 and 2570 for y 0 The values of u x and a y are com- 
puted from equation 4-4-3, otherwise the details of the computation are 
self-explanatory 

An alternative procedure is to use equation 9-6-2, which will auto- 
matically reduce the size of the numbers with which we must deal The 
procedure is illustrated m Table 9-10-2, with the same data as before 


Table 9-10-2 Use of Deviations from Mean 


X 

y 

x — % 


y—y 

(x — xf 

(y - y ) 2 

(x - x)(y - y ) 

12303 

2570 

-7 


-10 4 

49 

108 16 

+72 8 

12318 

2591 

8 


10 6 

64 

112 36 

+84 8 

12314 

2588 

4 


7 6 

16 

57 76 

+30 4 

12305 

2571 

-5 


-9 4 

25 

88 36 

+47 0 

T2310 

2582 

0 


1 6 

0 

2 56 

0 

Sum 

0 


0 

154 

369 20 

+235 0 

Sum/ A 

0 


0 

30 8 

73 84 

47 0 





Computations 






X 

= 12310 






y 

= 2580 4 








47 






T 

V30 SV73 84 







= 0 986 





To use this method, we must begin by computmg x and y, which m this 
case aie 12310 0 and 2580 4 We then subtract these means from each 
value of x and y in the table, and proceed as shown m Table 9-10-2 In 
general, this is slower than the procedure demonstrated m Table 9-10-1 

PROBLEMS 

1 Write a mathematical proof of equation 9-10-1 (Hint Show that it is 
equivalent to equation 9-6-2 ) 

2. The following table shows the barometric pressure m mches on five occasions, 
and the rainfall m inches during the subsequent 24 hours 

Pressure Rainfall 
30 1 0 00 

28 2 0 74 

29 9 0 31 

29 1 0 00 

28 1 0 52 
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Compute the coefficient of correlation between these two variables (Hint Use 
equation 9-10-1 with x 0 = 29 and y 0 = 0 3. Note that some of the quantities in- 
volved will be negative ) 

3 Fmd the coefficient of determination and the coefficient of ahenation for the 
above problem 

4 Write the regression equation for predicting the rainfall which will follow a 
given barometer readmg 

5 If the barometer readmg is 28 0 inches, what rainfall should be expected 
during the following 24 hours? 

6 What is the probability that the actual rainfall will exceed 1 mch? 

7 Find the coefficient of correlation between the voltages and amperages given 
in Table 1-2-1 (Use equation 9-10-1 with a suitable choice of x Q and y Q ) 

11. PROCEDURE FOR GROUPED DATA* 

If the number of pairs of values of x and y is veiy large, a great saving of 
time may be effected by forming a frequency tabulation of the data and 
using suitably modified procedures The frequency tabulation is formed 
m the same way as m the case of a single variable, except that we now 
require a class for each pair of values of x and y The tabulation is best 
performed by means of a two-dimensional array of cells, m which each 
column contains a class m x, and each row a class m y The details of the 
procedure are best made clear by an example Table 9-11-1 shows the 


Table 9-11-1 Scores of Twenty-Eight Students on Two 
Examinations 


X 

y 

X 

y 

X 

y 

X 

y 

41 

32 

40 

32 

71 

66 

36 

31 

62 

54 

27 

21 

46 

32 

46 

42 

43 

39 

64 

59 

56 

48 

44 

41 

42 

34 

56 

55 

57 

50 

56 

51 

57 

53 

37 

21 

68 

64 

61 

55 

71 

66 

22 

15 

32 

18 

51 

34 

60 

52 

38 

28 

59 

57 

44 

40 


scores made by twenty-eight students m 15 minutes and m 12 minutes on 
the same test To form a frequency tabulation of these data, we must 
first choose a set of limits for £ and another for y For convenience of 
tabulation, we choose 10 for the mterval m both cases, and construct a 
tally sheet as shown m Table 9-11-2 

* Articles 11 and 12 are rather lengthy explanations of practical procedures for 
finding r, which do not contribute to the development of the theory of statistics If you 
prefer to postpone these two articles until you have a need for the techniques, you can 
now proceed directly to Article 13 without loss of continuity 
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Table 9-11-2 Tally Sheet for Student Scores 


\ 

Limits 

m x 

20- 

29 

CO 

40- 

49 

50- 

59 

60- 

69 

70- 

79 

Limits 
m y 

N. X 

y N. 

24 5 

34 5 

44 5 

54 5 

64 5 

74 5 

10-19 

14 5 

i 

i 





20-29 

24 5 

i 

ii 





30-39 

tO 

CO 


i 


i 



40-49 

44 5 



mi 

i 



50-59 

54 5 




4H+ 

mi 


60-69 

64 5 





i 

n 


We now assign a value of u to each class m x, letting u = 0 for the class 
of largest frequency, and assigning the numbers —1, “2, etc, to the 
successive classes of smaller x, and +1, +2, etc, to those of larger x ) 
exactly as we did m Chapter 4 m finding the mean and standard devia- 
tion The present situation is different, however, m that we now have 
another variable, y , which must also be treated m the same way We 
assign the letter v to the corresponding numbers for the y classes as shown 
m Table 9-11-3 In order to obtain the greatest possible advantage from 


Table 9-11-3 Assignment of u and v 


\ x 
y \ 

24 5 

34 5 

44 5 

54 5 

64 5 

74 5 

V 

14 5 

1 

1 





-2 

24 5 

1 

2 





-1 

34 5 


1 

5 

1 



0 

44 5 



3 

1 



1 

54 5 




5 

4 

2 

2 

64 5 





1 

3 

u 

— 2 

-1 

0 

1 

2 

3 
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the simplicity of the new variables u and v, we must now express the co- 
efficient of correlation in terms of the new variables instead of the old 
ones The relationship between u and x, from 4-5-2, is x = x n + C x u, 
where we have added a subscript to the class interval umn order to dis- 
tinguish it from the class interval in y The value of x is x 0 + C x il, from 
equation 4-5-4, and x - x is therefore equal to C x (u - u) Similarly, 
y ~ V is equal to C v (v — v), where C v is the class mterval in y. If we 
substitute these m the standard equation for r (9-6-2), we have 

_ C£u — utCJy — v) 

C x a u C v a, 

where we have used the fact that <? x = C x <r u from equation 4-5-7 If we 
multiply out the parentheses, simplify, and cancel the C’s, this becomes 

uv — uv — uv *4- uv 
r i 

Vu(?v 

With the aid of equation 3-10-4 we see that the last three terms in the 
numerator are alike and can be combined, giving us 


r 


uv — uv 

V-uP V 


(9-11-1) 


This tells us that the equation for the coefficient of correlation has exactly 
the same form when we use the new units, u and v , as it had when we used 
the old ones, x and y We can therefore complete the problem of finding r 
without making any furthei use of x 0 , y 0 , C x , or C y 
The student can work out his own procedure for computing w, u, v , 
a u , and cr v m any given problem There are several standard procedures 
for carrying out the computations rapidly, of which two will be described 
here 


FIRST METHOD 

This method should be used if most of the cells m the frequency tabula- 
tion are empty, as m the problem being used here for illustration If there 
are more than about fifteen occupied cells, then the second method is 
usually faster 

The first method is illustrated m Table 9-11-4 The specific steps are as 
follows 

1 Make up a computing form like that shown m Figure 9-11-4, and 
enter the frequencies m the appropriate cells 

2 Add the frequencies across each row and enter the result under 
column / at the right Similarly, add the frequencies m each column and 
enter the result opposite / at the bottom 

3 Assign a value of u to each class m x ) and a value of v to each class 
m y It is usually best to let u and v equal zero for the classes with the 
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largest frequencies The negative values of u must be assigned to the 
smaller values of x, and similarly for v 


Table 9-11-4. First Method for Grouped Data 


N, X 

y \ 

24.5 

34 5 

44 5 

54 5 

64 5 

74 5 

V 

/ 

vf 

Vf 

14 5 

i4 

1 2 





- 2 

2 

-4 

8 

24 5 

1 2 

2 1 





-1 

3 

-3 

3 

34 5 


1 0 

5 0 

1 0 



0 

7 

0 

0 

44 5 



3 0 

i. 



1 

4 

4 

4 

54 5 




5 2 

4 4 


2 

9 

18 

36 

64 5 





1 6 

2 9 

1 

3 

3 

9 

27 

u 

-2 

-1 

0 

1 

2 

3 

2 

28 

24 

78 

f 

2 

4 

8 

7 

5 

2 

28 

Mean 

0 857 

= V 

2 79 

= V 2 

uf 

-4 

-4 

0 

7 

10 

6 

15 

0 536 

= u 


«/ 

8 

4 

0 

7 

20 

18 

57 

2 04 
= u 2 



<r, = V2 79 - (0 857) 2 = 1 43 cr u 


2w» =61 its = 2 18 

= 2 18 - (0 536) (0 857) 
r (1 32) (1 43) 


= +0 91 


V2 04 - (0 53 6) 2 = 1 32 


4 Multiply each value m the / column by v and enter the result m the 
vf column 

5 Multiply each of these values again by v and enter the result in the 
v 2 f column 

6 Sum the /, vf, and v 2 f columns and enter the result opposite 2 

7 Divide the last two sums by N to obtain v and v 2 In the example, 

we divide 24 by 28 tq_obtam 0 857, which is v, and we divide 78 by 28 to 
obtain 2 79, which is v 2 

8 Compute a v from the equation a v = vV 2 — f In the example, 
<j v is 1 43 
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9 Similarly, compute the values of uf and u 2 f and enter them m the 
rows atthe bottom of the table, sum these rows and divide by N to obtain 
u and u 2 ; and then compute <t u In our example, u is 0.536, u 2 is 2 04, and 
cr, is 132 

10 Compute the value oi f uv for each cell and write it m the corner of 
the cell Note that these will be negative when u and v are of opposite 
sign In our example, for the upper left-hand cell u is —2 and v is -2, 
so that uv is 4, which we write m the corner of the cell 

11. Multiply the frequency of each cell by the value of uv for that cell 
and add the results for all cells In our example, the products are, reading 
from left to right across each row, 1 X 4, 1 X 2, 1 X 2, 2 X 1, 1 X 0, 
5 X 0, 1 X 0, 3 X 0, 1 X 1, 5 X 2, 4 X 4, 1 X 6, and 2 X 9. The sum 
of all these products is 61 

12 Divide this sum by N to obtain uv. In our example, uv is 61/28 or 
2 18 

13. Substitute uv, u } v, <r u , and a v m equation 9-11-1. The computation 
is shown at the bottom of the table The final result is 

r = +0 91 

SECOND METHOD 

This method should be used if the number of cells containing entries is 
very large To illustrate this method, we will use a set of data for which 
most of the cells contam entries Table 9-11-5, containing a tabulation 
of the score made by students on an entrance examination and their sub- 
sequent grade averages, is of this sort The procedure for computing r 
by this method is shown in full in the table, and the specific steps are 
described below 

1 Make up a computing form hke the one in Table 9-11-5, allowing 
enough cells for the number of classes you have chosen Enter the fre- 
quency m each cell 

2 Assign values of u and v, and compute the entries in the /, vf, and 
v 2 f columns m the same way as for the First Method 

3 Multiply each frequency m the first row of the table by the corre- 
spondmg value of u, and add the results for the row Write the sum in 
the Zuf col umn Repeat this operation for each of the remaining rows 
Example The frequencies in the first row are 2, 5, 1, and 1 The corre- 
sponding values of u are —2, — 1, 1, and 3 The products of these in pairs 
are —4, —5, 1, and 3. The sum of these products is —5, which we write 
in the Zuf column 

4 Multiply each of these entries m the Zuf column by the correspond- 
ing value of v for that row, and write the result in the v Zuf column Ex- 
ample The first Zuf is —5, and the corresponding v is — 2 We write 
their product, 10, m the v Zuf column 
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5 Add the f, vf, v 2 f, 2uf, and v2uf columns, and write the sums at 
the bottoms of the columns 

6 Compute the entries in the /, uf, and w 2 / rows at the bottom of the 
table. 


Table 9-11-5. Second Method for Grouped Data 



v = 252/529 = 0 476 v = 738/529 = 1 395 


w = 78/529 = 0 147 w 2 = 488/529 = 0 922 

w = 259/529 = 0 490 

= VO 922 - (0 147) 2 = 0.949 

» = Vl 395 - (0 476) 2 = 1 081 

= 0 490 - (0 147)(0 476) _ 

(0 949)(l 081) 0 409 


<r 
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7 Multiply each frequency m the first column by its corresponding v, 
and add the results Write the sum at the bottom of the col umn opposite 

Repeat for each of the other columns Example For the first 
column we have —2 times 2 , plus — 1 times 5, plus 0 times 4, plus 2 times 
1. The total of these is — 7, which we enter in the 'Lvf row 

8 Multiply each value of Xvf by the corresponding value of u Ex- 
ample In the first column, Xvf is —7 and u is —2, their product is 14, 
which we enter m the u2vf row 

9 Add each of the rows /, uf, u 2 f, hvf, and u St and write the sum at 
the right end of each row A valuable check on the computations is 
provided by the fact that N , Sit, St;, and Xuv are each found in two ways 
m this method 

10 Divide all these sums and all the_sums of the vertical columns by N. 
These ratios give us the values of u, v, it 2 , v 2 , and uv, as shown at the bottom 
of the table 

11. Compute <7 U from equation 4-5-6 



Similarly, compute <r v In the example we obtain 0 949 and 1 081 for 
these two quantities 

12 Substitute uv, it, v, <r u , and <r 9 m equation 9-11-1: 

_ UV — uv 

&uG 1) 

The computation is shown at the bottom of the table The final result is 

r = +0 409 

PROBLEMS* 

The following problems should be solved with the data for twenty-five students 
given m Table 1-4-4 It is suggested that you use the following class limits, m 
order to obtain an exact check with the answers given m the back of the book 
column 1, 15 to 17, 18 to 21, etc, columns 2 and 3, 40 to 49, 50 to 59, etc, column 
4, 140 to 159, 160 to 179, etc 

1 Compute the coefficient of correlation between the entrance test score and 
the subsequent mathematics grade 

2 Compute the coefficient of correlation between the entrance test score and 
the subsequent language grade 

3 Compute the coefficient of correlation between the experimental test score 
and the subsequent mathematics grade 

4 Compute the coefficient of correlation between the experimental test score 
and the subsequent language grade 

5 With the above results, answer all parts of Question X, Chapter 1, Article 4 

6 Compute the coefficient of correlation between the two test scores 

7 Compute the coefficient of correlation for any of the data which you gathered 
for Problem 2 of Article 1. 

*Retam vour results of these computations for future use 
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12. REGRESSION EQUATIONS FOR GROUPED DATA 

When either of the above procedures has been used to obtain r, and the 
investigator wishes to obtain the regression equations and the standard 
error of estimate as well, the operational procedure is contmued from this 
point by the methods described m Articles 7 and 8 For the purpose of 
presen tmg a unified illustration, we will carry the example m Article 11 
through the remaining necessary steps, numbering them serially to con- 
tinue the sequence of steps m the preceding paragraph 

13 Compute x and y from equations 4-5-4 

X = Xo + C X U = 164 + 3(0 147) = 164 44 
y = y 0 + Cfi = 75 + 5(0 476) = 77 38 

14 Compute <r x and <r v from 4-5-7 

cr x = C x a u = 3(0 949) = 2 85 
^ = c v <r t = 5(1 081) - 5 40 

15 Write the regression equation (9-7-1) 

K 40 

y v = 77 38 + (0 409)(z - 164 44) 


or, simplifying, 


y v — 0 77x — 50 1 
16 Compute S v from 9-9-1 

S„ = <T v Vl - ? = 5 40 Vl - (0 409) 2 = 4 93 


PROBLEMS 

1 Compute r for the following data, using the Second Method of Article 11: 


Husband’s Age 



15 

25 

35 

45 

55 

65 

75 

85 


to 

to 

to 

to 

to 

to 

to 

to 


24 

34 

44 

54 

64 

74 

84 

94 

Wife’s Age 









15 to 24 

15 

12 

2 






25 to 34 


38 

26 

8 

2 


2 


35 to 44 


2 

31 

30 

5 

1 



45 to 54 

1 


9 

40 

24 

14 

2 


55 to 64 



1 

8 

19 

17 

2 

1 

65 to 74 




2 

1 

21 

6 


75 to 84 







8 

1 

85 to 94 








2 
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2 Write the regression equation for estimating the wife’s age when the husband’s 
age is known 

3 Find the standard error of estimate 

4 Write the regression equation for estimatmg the husband’s age if the wife’s 
age is known 

5 Estimate the wife’s age if it is known only that the husband’s age is 53. 

6 Compute the probability that m Problem 5 the wife is under 30. 

13. GRAPHICAL METHOD 

Many statisticians prefer to work with graphical procedures when 
speed is desirable and high accuracy is not necessary Others prefer to 
begin with a graphical procedure even when an exact computation is 
planned as well, m order to verify the assumption that a straight-lme equa- 
tion is suitable for the data, and to have a check upon the subsequent 
computations From the point of view of the student, the method is well 
worth studying m any case, since it provides additional msight into tHe 
nature of the regression equations and the coefficient of correlation 

As a piactical procedure, the method is particularly time-saving when 
only the regression equations and the coefficient of correlation are desired 
and is less so when the standard enor of estimate is also wanted 

In order to be able to compare the giaphical results with those obtained 
by the exact procedures, let us take as an example the problem m Table 
9-11-5, which we have already worked by the other method The new 
procedure is shown m Table 9-13-1 We begin by computing the average 
value of x for each row and the average value of y for each column Let 
us denote the row averages of x by x r and the column averages of y by y c 
These are computed m any of the customary ways For example, we can 
use the formula x = Xfx/N, and obtain, for the x r of the first row, 

£ r = i (2 X 158 + 5 X 161 + 1 X 167 + 1 X 173) = 162 3 

y 

The results of this computation are shown m the right-hand column of 
Table 9-13-1 

Now suppose that an incoming student scores 173 on his entrance 
examination and we wish to predict the grade average which he will earn 
if he remams m school We see from our table that the 6 students who 
made this score had a subsequent grade average of 80 8, and, if we had no 
other data, this would be the best possible estimate for the future grade 
average of the new student The chief weakness of this prediction is that 
it is based upon the records of only six students, and any conclusion based 
upon so few cases is very vulnerable to accidents of random selection. 

This weakness can be avoided if we base the prediction also upon the 
adjacent columns To do this, we plot the values of y c against the corre- 
sponding values of x and draw a smooth curve which fits all points as well 
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Table 9-13-1 Computations for Graphical Procedure 
Entrance Exam Score 


\* 
y \ 

158 

161 

164 

167 

170 

173 

/ 

x r 

65 

2 

5 


1 


1 

9 

162 3 

70 

5 

41 

34 

5 



85 

162 4 

75 

4 

46 

81 

46 

5 

1 

183 

164 1 

80 


22 

84 

49 

18 

1 

174 

165 1 

85 

1 


23 

23 

7 

2 

56 

166 2 

90 


3 

4 

8 

4 


19 

166 1 

95 



1 

1 


1 

3 

168 0 

/ 

12 

117 

227 

133 

34 

6 



Vc 

72 1 

74 1 

77 5 

79 4 

81 5 

80 8 




as possible. We then read off the value of y c from this curve for the given 
value of x , and thus obtain a “smoothed” prediction which is not so much 
affected by accidental groups of unusual grades If the points, when 
plotted, do not exhibit enough curvature for the investigator to be certain 
that it exists, then a straight line should be drawn instead of a curve, and, 
in what follows, we will assume that a straight line has been drawn 

In Figure 9-13-1, the values of y c are plotted against the correspondmg 
values of x, and the resulting points have been indicated by large dots 
(The other curve, whose points are indicated by crosses, does not concern 
us here and will be discussed later ) A lme has been fitted to the dots by 
visual inspection In fitting this lme, the author has attempted to give 
more weight to the middle four points, since these depend upon more 
cases 

If we use this line to predict the grade average of the new student whose 
entrance score was 173, we find that y at this point is 83 0, which is there- 
fore the best prediction we can make If another entermg student makes 
a score of 168, his predicted grade average will be 79 8 In short, this line 
is the lme relatmg any value of x to the correspondmg predicted value of 
y and is therefore identical m principle with the line of regression of y on x, 
described m Article 7 In practice the graphical lme will probably not 
comcide exactly with the computed fine, but any difference between them 
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Entrance exam score 

Figure 9-13-1. The Regression Lines. 


will be small if the line has been properly drawn. In any case the slope of 
the graphical line provides a good approximation to the slope of the 
theoretical line The equation of the theoretical line is, from 9-7-5, 

y v = y + b ux (x - x) 

and this equation must then be the equation of the graphical line, or nearly 
so We can therefore write, 

y c = y + b vx (x - x) (9-13-1) 

Our objective is to obtain from the graph a measurement of the quantity 
b yx For this purpose, let us mark any two pomts on the line and label 
their coordmates x l and {y c ) x , and x 2 and (y c ) 2 , respectively These two 
points may be anywhere on the line, but it is somewhat more convenient 
to select points which are not too close together and which differ by a 
simple number m their x coordinates The two pomts selected in the illus- 
trative problem are marked by arrows, and their coordmates are as follows 
a* = 162, (y^ = 75 8; s 2 - 172, (y e ) 2 = 82 5 
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Now since both of these points aie on the line, they both must satisfy 
equation 9-13-1 In othei words, it must be true that 

(Sc) i = V + &»x(si “ x) 

and (y,) 2 = y + b„ x (x 2 — x) 


If we subtract the first of these from the second and solve for b vx , we have 


b 


vx 


fat) 2 ~ (Sr) 1 
X 2 — Xi 


(9-13-2) 


or, for the two points selected m our example, 


82 5 - 75 8 
' yx ~ 172 - 162 


0.67 


This number is the coefficient of regression of y on x, described m Article 7 
It" is a quantity which describes the line of regression, and it is independent 
of the choice of the two points whose cooidmates are used to compute it 
If the leader is not experienced in computing, he should note that either 
the numerator or the denominator, oi both, may turn out to be negative 
and that errois m sign are easy to make in this computation 

Now let us consider the inverse problem If we know the value of y, 
how can we best estimate the corresponding value of x ? For example, if 
a given student did not take the entrance examination, but made a sub- 
sequent grade average of 83 0, what is the best estimate of the score which 
he would have made on the entrance examination if he had taken it ? 
The role of the two variables is now reversed, and we need an average 
value of x for all the students who subsequently earned a grade of y; m 
other words, we need a line obtained by plotting x r against y Such a 
graph is shown m Figure 9-13-1, where the points are indicated by crosses 
to distinguish them from the points of the previously discussed line A 
straight line (dotted) has been drawn in by visual inspection and labelled 
x r , y The desired estimate is now made by finding the point on this line 
where y equals 83 0, and reading the corresponding value of x This value 
is 165 3, which is consequently the best possible estimate of the score which 
the student would have made on the entiance examination 

Since the dotted line shows the relation between each value of y and 
the best estimate of the corresponding value of x, it is identical m principle 
with the line of regression of x on y discussed m Article 7 Any two points 
on the dotted line must then satisfy equation 9-7-6 If we select two such 
points and call their coordinates (x r ) 1 , y 1 , (x r ) 2 , and y 2 , we can therefore 
write 

(.Xr) x = x - b xv (y x - y) 

and 

(x ^) 2 = x - b xv (y 2 - y) 
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and, proceeding as before, 


(% t ) 2 (*T r) 1 

V2 ~ Vl 


(9-13-3) 


Two such points have been selected m Figure 9-13-1 and marked with 
arrows Their coordinates are as follows (^ r ) x = 161 9, y 1 = 65 0, (x r ) 2 = 
167 6, and y 2 = 95 0 When these values are mseited in equation 9-13-3, 
we have 


167 6 - 161 9 
95 0 — 65 0 


0 19 


This quantity is the coefficient of regression of x on y , described in Article 7. 

The two lines in the diagram can be used for estimating or predicting, 
either directly m the graphical form or m the form of the regression equa- 
tions There remains the objective of finding the coefficient of correlation 
between x and y An mspection of equations 9-7-3 and 9-7-4 shows that 
this objective can be readily achieved by multiplying the two regression 
coefficients together and finding the square root of the results* 



In our example, 

r = s/ (0.67) (0 19~) = 0.36 


This is to be compared with the value of 0 409 obtained by the exact 
method described m Article 11 The agreement is only fair, and illus- 
trates the limitations of the graphical procedure In general, the method 
will work well if the points fall nearly upon a straight line and are of nearly 
equal weight and will work less well if there is a large scatter of the plotted 
points or if they are of markedly unequal weight 

The graphical procedure has the advantage that the degree of correlation 
becomes apparent m a general way as soon as the two sets of points are 
plotted If the correlation is high, the two lines will nearly coincide, while 
if it is low, one lme will be neaily vertical and the other nearly horizontal 
To secure best results from the graphical procedure, the following points 
should be kept m mind 

1 In view of the principle of least squares, one large deviation is more 
to be avoided than two deviations each half as large It is wrong to secure 
a perfect fit with almost all the pomts at the expense of a large misfit for 
the remaining one oi two pomts 

2 Not all the points have the same weight, since some of them depend 
upon many cases and others upon only a few Try to secure the best fit 
with the points which represent most data 

3 If the pomts exhibit an excessive and confusmg scatter, it is some- 
times advantageous to replace two pomts by one master point halfway 
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between them (or nearer one than the other if the two are of unequal 
weights), and thus to reduce the number of pomts to be visualized Only 
points arising from adjacent columns or rows should be combined This 
process of combmation should be used with caution and should be carried 
only far enough to make it possible to visualize a good line of best fit. 

4. Since the process depends upon a sort of visual balancing of a number 
of pomts on your diagram, it is well to make these points conspicuous. 
They should be sufficiently large and black so that they stand out above 
any writing or labelmg on the diagram, and so that the entire pattern can 
be taken m at a smgle glance. 

PROBLEMS 

1 Using the graphical procedure, find the equations of the regression lines for 
the data m Problem 1 of Article 12 Compute from the slopes of these lines the 
coefficient of correlation Compare your graphical results with the exact results 
whch you obtamed m Article 11 

14. CORRELATION BY RANKS 

In many cases an investigator finds himself dealing with data of a non- 
quantitative sort, m which the items are arranged in order of size or 
quality, although the exact size or quality of each is unknown or even 
unmeasurable A votei may be fully confident, for example, that he likes 
Candidate A better than Candidate B, and Candidate C better than either. 
Or, again, it may be clear from the record that F is a better fighter than 
G, who m turn is a better fighter than H To try to measure the exact 
amount of the voter’s liking for C, or to estimate the quantity of F’s fighting 
ability, would be difficult or absurd The methods of correlation can 
however be readily adapted to such cases m spite of the non-numerical 
nature of the data, and, m fact, a considerable simplification of procedure 
is possible 

To illustrate the procedure, let us use the following data Ten men 
compete m elimination tournaments m both golf and tennis, so that they 
can be listed m order of ability m each sport 


Player 

Rank in Tennis 
(m) 

Rank m Golf 

(n) 

Jones 

4 

2i 

Brubaker 

1 

2- 

Connor 

10 

7 

Harrison 

7 

9 

Lmteau 

9 

10 

Grady 

2 

1 

Orsim 

3 

4 

Deutsch 

8 

5 

Rodriguez 

6 

7 

Swenson 

5 

7 
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Biubaker won the tennis championship and so has rank “one” in tennis, 
Grady came m second and so has rank 2, and so forth. In golf, Jones and 
Brubaker were tied for second place If one had beaten the other, they 
would have had ranks 2 and 3, so, smce they were tied, we assign each of 
them the average of these two ranks, or 2| In the same w T ay, Connor, 
Rodriguez, and Swenson were involved m a three-way tie and would other- 
wise have had ranks 6, 7, and 8, so we assign each of them the average of 
these three numbers, or 7 Data of this sort are called ranked data 

In order to apply our equations to this problem, let us make the as- 
sumption that ability m tennis is divided m ten equal steps, that is, th a t 
Brubaker is just as much better than Grady as Grady is better than 
Orismi, and so on through the set of ten men In other words, if x measures 
tennis ability, and m is a player’s rank m tennis, we assume that x ~ 
x 0 + mC x , where C x is the difference m ability between successive ranks, 
and x Q is a suitable starting pomt of measurement. This equation is 
purely formal, to be used only for the derivation of the formula for the 
coefficient of correlation for ranked data, and it is not necessary to assign 
a value to x 0 or to C x In the same way, if y is a measure of a player’s 
ability m golf, and n is his rank, we assume that y = y 0 + nC v If we now 
average these values of x and y and form the deviations from the mean we 
will have 

x — x — C x (m — m) 

and y — y = C v {n — n) 

Let us now insert these values m 9-6-2 


_ C x (m — m)C y (n — n) 

or, by the same procedure which we used m deriving 9-11-1, 

rrm — inn 
r — 

0 "? n&n 


(9-14-1) 


This formula can be simplified if we mtroduce the difference between 
tennis rank and golf rank for each player We observe that 

(m — n) 2 — {in — 2 mn + n 2 ) — rri — 2mn + n (9-14-2) 

or, solving for rim, 

rrm = \rri + \rt — \{m — n) 2 
If we substitute this m 9-14-1, we will have* 

\rri + \n — § (m — n) 2 — rrm 

V 7 m 2 — m 2 vV — n 


r— 


(9-14-3) 
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where we have also made the usual substitution (4-4-1) for a m and <r„ We 
can simplify this by usmg equations 3-11-1 and 3-11-2 

N + 1 
m = n = — ^ — 

j ? = SZ+JMUiJ) 


where N is the number of items m the ranked data (In our example, 
N = 10 ) Making these substitutions m 9-14-3 and simplifying the re- 
sulting expression, we have 


_ _ i _ 6 ( m - n T 
r ~ 1 N 1 — 1 


(9-14-4) 


This is customarily written with (m — n) 2 leplaced by 


2(m — ri) 2 
N 


6Z(m ~ n) 2 
N(N 2 - 1) 


(9-14-5) 


which is the standard foimula for coirelation by ranks 

For our example the necessary computations are shown m Table 9-14-1, 


Table 9-14-1 Correlation by Ranks 


m 

n 

| m — n | 

(771 — n) 2 

4 

9! 

ih 

2| 

1 

9I 

"2 

H 

2| 

10 

7 

3 

9 

7 

9 

2 

4 

9 

10 

1 

1 

2 

1 

1 

1 

3 

4 

1 

1 

8 

5 

3 

9 

6 

7 

1 

1 

5 

7 

2 

4 




34| 


from which we find that 2(m — ri) 2 is 34| Inserting this m equation 
9-14-5 


r = 1 — 


6(34 5) 

10 ( 10 2 - 1 ) 


= 0 791 


This formula is so easy to use and so rapid in computation that it is 
sometimes used to secure an approximate value of r even for cases m which 
the values of x and y are accurately known We will illustrate this by 
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applying it to the data in Aiticle 5 of this chapter We assign a rank to 
each student m hours of study and exammation grade, as shown in Table 
9-14-2 We then find for r, 


r = 1 — 


6(2 5) 
5(5 2 - 1) 


= 0 875 


This is to be compared to the value 0 845 obtained by more exact means 
m Article 5 The agreement is very close, because the actual steps m x 


Table 9-14-2 Comparison with Exact Method 


Name 

X 

y 

m 

n 

| m — n | 

(m — n) 2 

Jackson 

5 

26 

1 

1 

0 

0 

Petoskey 

4 

17 

2 

2i 

1 

2 

\ 

Goldberg 

3 

17 

3 

2| 

h 

i 

Bellini 

2 

11 

4 

5 

1 

1 

Schwartz 

1 

14 

5 

4 

1 

1 







2§ 


are equal to each other, and those in y are nearly equal to each other, so 
that the assumptions made m deriving 9-14-5 are nearly fulfilled. In 
general we cannot expect such close agreement If a precise value of r 
is needed, the methods of ranks should of course not be used. 

PROBLEMS 

1 The manager of an office believes that Mr Bjorkman is his most valuable 
assistant, followed by Dimitroff, Siebert, Smart, Kendall, Barrett, and Turner, in 
that order Bjorkman has been m the office for 14 years, Dimitroff 7 years, Kendall 
and Turner 5 years, Siebert 3 years, and Barrett and Smart 2 years. Using the 
method of correlation by ranks, find the coefficient of correlation between value 
and length of service 

2 Five friends, Allison, Daily, Johnson, Manchetti, and Wojicki, held a tourna- 
ment m tennis and another tournament m chess, with the following results 


Chess Tennis 


Game 

Won 

Tie 

Lost 

Game 

Won 

Tie 

Lost 

1 

D 


A 

1 

J 


D 

2 

w 


j 

2 

M 


A 

3 


W, D 


3 

A 


W 

4 


J, A 


4 

J 


W 

5 

M 


j 

5 


M, J 


6 

D 


j 

6 

M 


w 

7 

D 


M 

7 

W 


D 

8 


A, M 


8 

A 


D 

9 

W 

M 

9 

J 


A 

10 

W 


A 

10 

M 


D 
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Find the coefficient of correlation between ability m chess and ability m tennis 
(Hint. Notice that Manchetti has higher rank than Allison m chess, even though 
they tied, because Manchetti defeated Johnson and Johnson tied Allison ) 

15. COEFFICIENT OF CORRELATION FOR AN UNKNOWN UNIVERSE 

In the material presented so far m this chapter, we have considered the 
coefficient of correlation to be a number desciibmg the relationship between 
the set of rc ? s and y’s in our tables of data As such it is an exact measure, 
and no question can arise as to how well it fits the data or as to its range 
-of uncertainty We are usually interested, however, not only m desciibmg 
the data m hand, but m drawmg conclusions about the universe from 
which the data came, and in this case we must consider the question of 
how well a value of r computed from a sample can be expected to describe 
the umverse from which the sample was drawn To discuss this question 
m detail would take us out of the field of elementary statistics, and we will 
content ourselves with two statements which we will present here without 
proof 

(1) The coefficient of correlation for the umverse can be expected to be 
a little smaller than that of the sample The reason for this can be seen if 
we recall that the coefficient of correlation measures the closeness of fit 
between the points on a scatter diagram and the line of best fit Now if 
there are only a few points, we can adjust the line to fit some of their 
accidental deviations, and thus secure a misleadingly good fit. To carry 
the argument to its extreme, if we i educe the number of points to two we 
can always secure a perfect fit with a straight line, and thus the coefficient 
of correlation would always turn out to be exactly 1 To obtain the best 
possible estimate of the value which we would find for nf we could study 
the entire universe, we must use a formula which lemoves this effect of 
overestimating the correlation when the sample is small The appropriate 
formula is 

(9-15-1) 

where r u is the best estimate of the coefficient of correlation of the umverse, 
and N is the number of observations contained m the sample If, for 
example, r is 0 5 and N is 10, then the best estimate of the coefficient of 
correlation m the umverse from which the sample was taken is 0 4 Again, 
if r is 0 5 and N is 5, then r u is zero Thus we see that if we find a co- 
efficient of correlation of 0 5 from a sample containing only five items, we 
cannot be certam that there is any correlation whatever m the umverse, 
while if we obtain the same result from a sample of ten, we can assert that 
a correlation probably exists m the umverse, but its most likely value is 
only 0 4, as compared to 0 5 for the sample 
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Differences in the second decimal of the coefficient of correlation have 
little practical meaning, and if r u differs from r by less than about 0 05 the 
correction is scarcely worth applymg unless high precision is needed 
With this arbitrary limit in mind, equation 9-15-1 tells us that no correc- 
tion is necessary if r is 0 2 or greater and N is 50 or greater, or if r is 0 5 
or greater and N is 20 or greater, or if r is 0 7 or greater and N is 10 or 
greater Since most practical problems m correlation fall beyond these 
limits, it is generally reasonably safe to assume that the coefficient of 
correlation obtained from the sample is also representative of the universe, 
and formula 9-15-1 is necessary only if either r or N is very small, or if 
high precision is desired a 

(2) If a number of samples are drawn from the same universe, the values 
of r obtained from them would differ a little from one sample to another 
It can be shown theoretically that such a set of r ’ s would form a popula- 
tion whose standard deviation is 


cr r 


1 - r 2 

Vn - i 


(9-15-2) 


where the r on the right refers to the universe If, for example, r is 0 5 
and N is 101, then cr r is 0 075 This tells a great deal about the reliability, 
or range of uncertainty, of r We know, for instance, that the probability 
is 0 68 that our value of r is withm 0 075 of the value of r which we would 
have obtained by averaging the r's from a number of samples of 101 each, 
which we can assume is very near to the r of the universe This informa- 
tion can also be expiessed m terms of the probable error of r, which is 
0 6745 times 0 075, or 0 051 This tells us that the probability is 0 5 that 
our value of r is within 0 051 of the value of r for the universe, or, m stand- 
ard notation, 

K = 0 50 ± 0.05 

where we have ignored the small correction given by equation 9-15-1 
The probability that r u will lie between any given limits can then be found 
by means of the normal curve tables 

If N is small, or if r is large, then the sample r’s will no longer form an 
approximately normal distribution, and the normal curve tables can no 
longer be used with equation 9-15-2 to compute confidence limits Roughly 
speaking, if N is less than 50 or if r is larger than 0.6, then the skewness of 
the distribution of r’s is so great as to make the normal curve tables un- 
suitable 

Equations 9-15-1 and 9-15-2 mdicate that it is rarely worth while to 
compute a correlation coefficient with fewer than twenty-five variates, 
and it is advisable to have fifty or more to determine r with moderate 
accuracy, or several hundred to determine it with high accuracy 
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PROBLEMS 

1. Compute the best estimate of the coefficient of correlation for the universe, 
and its probable error, for the following cases 

(a) r = 0 7, N = 10, (b) r = 0 7, N = 50, (c) r = 0 7, N = 500; 

(d) r = 0 2, N = 25, (e) r = 0 2, A = 1000, (f) r = 0 9, A = 5 

In which cases do you think that a real correlation undoubtedly exists m the 
universe? 

2 Usmg your answer to Problem 3, Article 6, compute the coefficient of correla- 
tion which you would expect to find m a very large group of similar children 
^ 3. Compute the probable error of your answer to the preceding problem 

4 Usmg your answer to Problem 1, Article 11, compute the coefficient of corre- 
lation which you would expect to find m the umverse Compute also the probable 
error of your answer 

5 Repeat Problem 4 above, this time using your answer to Problem 1, Article 12 

6 Compute the standard deviation of a value of r obtained from a sample of 
five 0 items from Table 10-5-1, using the fact that the value of r for this table is 0 83 
Test your result by choosing ten random samples of five each, computing the 
value of r for each sample, and then computing the standard deviation of the 
resultmg ten values of r If you can divide the work with other students, repeat 
this test with samples of other sizes 

16. SUMMARY 

The procedures described in this chapter have three objectives* (a) To 
measure the correlation between two variables, (b) to find an equation 
for predicting the value of one variable when the other is known, and (c) 
to find the range of uncertainty of such a prediction These three objec- 
tives are fulfilled as follows 

(a) The correlation is measured by the coefficient of correlation , r, w r hich 
ranges from zero when the variables are independent to one when they are 
totally dependent and which is positive or negative depending upon whether 
y increases or decreases with increasing x 

(b) The task of predicting one variable from known values of the other 
is accomplished by means of a pair of regression or prediction equations, 
and the subscript “p” is used to indicate which variable is being predicted 
from the other 

(c) The uncertainty of prediction is measured by the standard error of 
estimate , S v , which is the standard deviation of the errors which the 
equation makes m predicting the values of y m the original data, and which 
is therefore assumed to be the standard deviation of the errors which will 
occur m usmg the equation to predict values of y coi responding to new 
values of x 

CHOICE OF PROCEDURES 

Three main procedures will be described, with subdivisions to be noted 
later The exact procedure is the procedure which should be used m almost 
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all practical problems, while the other two procedures are specialized 
methods for occasional use. The graphical procedure is to be used if 

(a) approximate results are sufficient for the needs of the problem and 

(b) the investigator wishes to compute only r and the regression equation 
and does not wish to find S v Some mvestigators carry out the graphical 
procedure as well as the exact procedure, m order to have a check upon 
their results. The procedure for ranked data is used for data which is 
non-quantitative, but which can be arranged m an order of quality or size 
The procedure is also used occasionally as a quick approximate method for 
finding r for quantitative data 

EXACT PROCEDURE 

I Plot the data, or, if it is extensive, plot part of it This step is not 
rigidly necessary, but is recommended as a check upon the validity of the 
assumption that the relationship between the two variables can be de- 
scribed by a linear equation If there is conspicuous curvature of the 
trend of the plotted points, then the methods of this chapter are not 
applicable, and Part III of the summary at the end of Chapter 8 should be 
used instead 

II Compute r by whichever of the following procedures is applicable 
A If a slide rule is to be used, and there are fewer than about twenty- 

five variates, then use one of the following procedures 

(1) If the variates are small numbers, use equation 9-6-1: 

r = §/ - *y 

The procedure is demonstrated m Table 9-6-1 

(2) If the variates are large numbers with a moderate spread, subtract 
a fixed number x 0 from each value of x and a fixed number y Q from each 
value of y } to reduce them to manageable size Compute r from equation 
9-10-1, 

r _ (g - Xp)(y - jt/o) - (x - sp) (y - y 0 ) 

Cfx^v 

in which it is most convenient to compute o x and <j v from equation 4-4-3. 
The procedure is demonstrated in Table 9-10-1 (An alternative procedure, 
which is occasionally faster if the means are whole numbers, is demon- 
strated m Table 9-10-2) 

B If a computing machine is to be used, and there are fewer than 
about one hundred variates, then use either method 1 or 2 in section A 
Note that it is not necessary to record the individual values of xy, x 2 , and 
y 2 , because these can be accumulated in the machine as they are computed 
C. If the data are too numerous for the above methods (more than about 
twenty-five if a shde rule is to be used; more than about one hundred if a 
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computing mac hin e is to be used), then form a double frequency tabula- 
tion as shown m Table 9-11-2 Proceed as follows. 

1 If most of the cells are empty, as they are for example m Table 9-11-3, 
then carry out the procedure which is shown m Table 9-11-4 and which is 
described as the First Method m Article 11. 

2. If most of the cells are filled, as they are for example m Table 9-11-5, 
then carry out the procedure which is shown m that table and which is 
described as the Second Method m Article 11 

III The regression equation is obtained by inserting the proper values 
mto equation 9-7-1, 

Vv = V + ~ K* - 
O'* 

or, if x is to be estimated or predicted from y, into equation 9-7-2, 

x v = x + ~ r(y — y) 

* <*v 

IV The standard error of estimate is now computed from equation 
9-9-1, 



This is the standard deviation of the errors to be expected m any pre- 
dictions which we obtam from the regression equation 

GRAPHICAL PROCEDURE 

I Form a double frequency tabulation as shown m Table 9-11-2 

II Compute z-average for each row and label it x r , compute ^/-average 
for each column and label it y c , as shown m Table 9-13-1 

III Plot each value of x against the corresponding value of y e and draw 
a line of best fit by visual inspection, as shown m Figure 9-13-1 Usmg a 
different symbol, plot each value of y against the corresponding value of 
x r and draw a line of best fit Before drawing these lines, review the 
suggestions at the end of Article 13 

IV Select two points on each line and read their coordinates Then 
compute b vx and b xv from equations 9-13-2 and 9-13-3, 

T. _ (Sc) 2 - (Sch 

UyT ~ “ 

X2 Xl 

7 _ (*T r) 2 (%r) 1 

Vxv ~ * 

2/2 - Vi 

Y Compute r from equation 9-13-4, 



The value of r so obtained can be checked roughly from the relative posi- 
tion of the two lines of regression on the graph If the lines nearly coincide, 
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then r is very large, while if the lines are nearly at right angles to each 
other, then r is very small 

VI To predict y for a given value of x , either of two procedures can be 
followed 

A Fmd the value of x on the graph and read the value of y c from the 
proper lme The value of y c so obtained is the best predicted value of y. 
If an estimated or piedicted value of x is needed for a known value of y y 
use the line relating y to x r 

B If you prefer to work with a regression equation read the approxi- 
mate value of x and y from the intersection of the two lines, and insert 
these and b vx mto equation 9-7-5, 

Vv = V + b yx {x — x) 

A similar equation for estimating or predicting x from known values of y 
can be obtained from the corresponding equation 9-7-6, 

x v = x + b xu (y ~ y) 

This procedure is illustrated m Article 13 

PROCEDURE FOR RANKED DATA 

I Assign a rank, or order number, to each variable Let m be the rank 
in x and n the rank m y In other words, let m = 1 for the entry with the 
smallest x, m = 2 for the entry with the next smallest x, and so forth If 
several entries are tied, assign to each of them the average of the ranks 
they would have had if they had just missed bemg tied (For example, if 
two entries are tied for third place, they would otherwise have had ranks 
3 and 4, therefore assign both the rank of 3J ) 

II Compute the coefficient of correlation from equation 9-14-5: 

- 6 X(m — n) 2 

r = 1 ~ w* - i) 

An example is shown in Table 9-14-1 

INTERPRETATION 

I For a quantitative interpretation, r is less valuable than the co- 
efficient of determination, D , which is equal to the square of r The 
quantity D can be loosely described as the “percentage of relatedness,” 
and the corresponding quantity 1 — D can be thought of as the “per- 
centage of independence ” More exactly, D is the fraction of the total 
variance which is predicted or explamed by the relationship described by 
the regression equation, and 1 — D is the fraction of the total variance 
which the regression equation fails to predict or explain. 

II It is important to remember that a large coefficient of correlation 
does not mean that changes m x “cause" changes in y. It is equally likely 
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that y is the causative factor and x the lesultant, or that both are result- 
ants of other causative factors and have no direct effect upon each other 
It is also possible, particularly if r is small, that there is no causal re- 
lationship of any sort between x and y , but that accidents of selection of 
data have produced the apparent xelationship 

The use of coefficients of coi relation in mterpieting data is beset with 
many pitfalls of logic, and the careless investigator can easily be trapped 
into drawing unwarranted conclusions The safest insurance is a thorough 
knowledge of the field which you are investigating, and a habit of carrymg 
out a “co mm on sense” analysis along with the statistical analysis Some 
specific suggestions m this direction will be discussed m Chapter 13. 

III When a prediction has been made, the standard error of estimate 
can be used m any of several ways to express the uncertainty of the pre- 
diction 

(a) S v is the standard deviation of the expected errors of prediction, 
and the probability is therefore 0 68 that the true value of y will differ 
from the predicted value by less than S v 

(b) We can compute the probable error of the estimated value of y by 
multiplying S v by 0 6745 This can then be used with a plus or minus 
sign with the standard scientific meaning Thus 

y 9 = 1807 d= 43 

means that the probable error of the estimated value is 43, and that the 
probability that the true value lies between 1807 + 43 and 1807 — 43 is 
one-half 

(c) The probability that the true value will lie within any given range 
of values can be found as follows Subtiact each boundary of the range 
from y v ; divide the results by S v , find the resulting values m the t column 
of the standard curve tables, read the two corresponding values m the 
“area” column, and finally, subtract the smaller from the larger of these 
two areas, or, if the two values of t differ m sign, add the two areas This 
procedure is illustrated m Aiticle 8 

IV The value of r obtained by the procedures outlined above is a 
quantity which describes exactly the correlation existing m the data used 
for the computations It is not necessarily the best possible guess as to 
the amount of correlation existing m the universe from which the data 
were taken, and if we wish to make the best possible estimate of the corre- 
lation in the universe, a small correction is necessary The larger the 
sample, the more nearly the value of r will describe the universe, and the 
following limits are suggested 

If r is 0 2 or greater and N is 50 or greater, or 
if r is 0 5 or greater and N is 20 or greater, or 
if r is 0 7 or greater and N is 10 or greater, then 
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no correction is necessaiy unless high accuiacy (closer than 0 05) is de- 
sired m the estimate for the universe If the values of r and N fall below 
these limits, then the value of r for the universe should be computed from 
equation 9-15-1, 

K = 



V The standard deviation of an estimated r for the universe is given 
by equation 9-15-2, 


1 - 



This can be used to compute confidence limits m the usual way if AT is 
large, but if AT is small the distribution of r’s ceases to be approximately 
normal, and the normal curve tables are no longer appropriate for the 
computation of confidence limits 
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SAMPLING AND RELIABILITY 


1. THE UNIVERSE AND THE SAMPLE 

Sometimes a statistical problem is of such a nature that all the lelevant 
data are available to the investigator For example, if a statistician were 
studying the age distribution of the members of the present United States 
Senate, he could obtain the birth date of every man now m the Senate, 
and he could compute an arithmetic mean and a standard deviation which 
would be completely accurate for the group No question could anse of 
whether the arithmetic mean and the standard deviation “represent” the 
group 

This situation is however an unusual one In general it is not feasible 
to obtain data concerning all of the group about which you wish to draw 
conclusions, and m many cases it is entirely impossible to do so If, for 
example, an investigator wishes to study the distribution of heights of 
American men, it would not be feasible to collect data concerning all the 
millions of American men now living Furthermore, it would not be 
necessary Instead a group of several hundred men would be chosen in 
as representative a way as possible, and the mean and standard deviation 
of this selected group, or sample , would be computed The investigator 
would then make the assumption that the mean and standard deviation of 
the heights of all the men m America aie the same as those of the sample, 
or are nearly enough so for his purposes 

Or let us suppose that an agricultural research worker wishes to study 
the advantages and disadvantages of raising a given variety of tomatoes 
m southeastern Ohio He would undertake to find the average yield per 
acre and the standaid deviation of this average yield, and he would collect 
all available records of crop sizes for this purpose But it is very unlikely 
that he would be able to discover the yield per acre from every farmer 
who had ever raised this variety of tomatoes He must, then, compute 
the mean and standard deviation from the data available, and assume that 
these are representative not only of his working data, but also of the data 
which were not available to him Furthermore, even if he were able to 
collect data about all the tomatoes which had ever been raised m Ohio, 
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this would not fulfil the purpose of his study. If his ultimate purpose is 
to make recommendations to farmers, then he must assume that his 
results are representative also of all tomatoes of this variety which will be 
raised m the future m southeastern Ohio One part of his “universe” is 
therefore completely inaccessible to him 

Again, let us suppose that a company manufactures a mm etei s for 
automobile instrument panels and has an output of 7000 pei day The 
resistances of the ammeters would be nearly alike, but there are small 
variations from one to another, and it is desirable to study the distribu- 
tion of these resistances In this case the statistician is not limited by th§^ 
availability of data— he could measure the resistance of all 7000 each 
day if he wished — but he would nevertheless select a sample containing a 
much smaller number solely for the purpose of saving time The statis- 
tician here makes the assumption of “representativeness” of his own 
volition 

A slightly different situation arises if we suppose that the same company 
also manufactures fuses for electric circuits and requires a description of 
the distribution of the amperages at which the fuses “blow” or break 
Such information is necessary m the marketing of the fuses, but each time 
a specific fuse is tested it is of course destroyed m the process and is not 
available for marketing The statistician m this case has no choice, he is 
forced to test a sample and assume that his results are valid for the 
universe 

We have seen that m all such cases the statistician uses the term 
“sample” to descnbe the small group of items which are used in the study, 
and the term “universe” for the entire group of comparable items about 
which he wishes to draw conclusions Sometimes it is difficult to decide 
]ust what the limits of the universe are, particularly if the statistician is 
working with data collected by other investigators for their own purposes 
What is the universe, for example, corresponding to a table of heights and 
weights of the male members of the Senior class at Ohio University m 
1950? Does the universe consist of all American men? All American male 
college students? Or all Ohio men? The answer to such questions is 
very difficult, and an unwise answer can invalidate many of the statis- 
tician’s conclusions 

Another aspect of the problem arises when the statistician knows what 
his universe must be and wishes to select a sample which will represent it 
Suppose that he wishes to know the average mcome of Ohio University 
male students Can he use the members of a given fraternity for a sample? 
Obviously not, because there is a tendency for fraternities to select students 
with higher than average incomes Can he obtain a representative sample 
by questioning students at random on the campus? Perhaps so, but it is 
conceivable that at the time of his census a number of students are work- 
ing and that the students who are walking the campus are the non-employed 
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ones, who represent a non-typical sample so far as income goes The 
problem of securing a truly random sample is a very difficult one, filled 
with unexpected problems, and failure to secure a representative sample 
is one of the major sources of error m piaetical statistical investigations 
A detailed discussion of this aspect of the problem will be postponed until 
Chapter 13, and in the present chapter it will be assumed that the universe 
is homogeneous and unambiguously defined and that the sample is free 
from systematic errors of selection 

In all cases we study the special properties of the sample and assume 
=^hat the universe has the same or similar properties This assumption 
introduces two specific questions 

A Are there any systematic and predictable differences between the 
properties of the sample and properties of the universe? If we find that 
the standard deviation of a sample of twenty-five items is 2 7, does it 
follow that 2.7 is the best possible guess as to the standard deviation of 
the universe? If not, what corrections should we apply to obtam the 
most likely value or “best guess”? 

B. When we have estimated the most likely value of any descriptive 
quantity for the universe, within what limits can we trust it to agree with 
the facts? If the mean breaking amperage of a sample of twenty fuses is 
24 75 amperes and the standard deviation is 0 38 ampere, can we be reason- 
ably sure that the mean of the entire output is between 24 50 and 25 00? 
Between 24 70 and 24 80? What is the specific probability that the true 
value will fall within each of these two ranges? 

In a general way, the range of reliability is obvious from a common 
sense inspection of the problem In B for instance, if a different set of 
twenty fuses had been chosen, it is highly unlikely that it would also have 
a mean breaking amperage of 24 75 and a standard deviation of 0 38 It 
would not be astonishing, for instance, if a second sample had a mean of 
24 85, with a standard deviation of 0 45 If it were possible to sample 
the entire day’s output, it would probably differ from both of these results 
On the other hand, it would probably not differ from them by very much 
If we have only the first sample, the most that we can say is that the mean 
of the day’s output is probably very close to 24 75, or that it is equal to 
24 75 plus or minus a possible small error, and we cannot make a more 
exact statement until we know something about the distribution of the 
possible values of this small error One way to investigate this distribution 
experimentally is to select several other samples and observe the range 
between their means If, for mstance, we find that when five samples 
are selected and studied, the five means all agree within 0 50 ampere, then 
we could reasonably conclude that the mean of the universe will probably 
not differ from the sample means by more than this 

But there is a much better way to determine this range of uncertainty 
It depends not upon experiment but upon theoretical laws, which will be 



ART 3] 


SAMPLING AND RELIABILITY 


199 


derived in the following paragraphs First, however, we must set up a 
specific method for describing this range of uncertainty m a quantitative 
manner 

2. STANDARD DEVIATION AS MEASURE OF RANGE OF UNCERTAINTY 

The discerning readei will have noted that m the course of the preceding 
chapters the meaning of the standard deviation has undergone a change 
of emphasis In Chapter 3 the standard deviation was introduced as a 
descriptive device, used for the purpose of condensing and summarizing 
the essential data contained in a frequency table, and it was defined as the**^ 
square root of the average of the squares of the deviations from the arith- 
metic mean In latex chapteis it was used to indicate the half-width of 
the region containing 68 per cent of the probability of occurrence of a 
variate chosen at random, m the same way that the probable error is used 
to indicate the half-width of the region containing 50 per cent of the 
probability of occurrence In the illustrative problem in Article 9 of 
Chapter 9, for example, we computed the predicted score for a student, 
Don Poller, and found it to be 15 5, with a standard deviation of 2.7, or a 
probable error of 1 8 It is tiue that this standard deviation describes the 
distribution of errors of prediction for all students comparable to Mr 
Poller, but we are not interested m all students, or indeed in any but Don 
Poller when we state this probable error The standard deviation of 2 7 
tells us that the probability is 0 68 that Mr Poller will scoie between 12 8 
and 18 2, and the probable eiror of 1 8 tells us that the probability is 0 50 
that he will score between 13 7 and 17 3 The standard deviation has 
now become a device for stating our beliefs about the probability of oc- 
currence of a given event, and we can use it in this sense whenever we 
have sufficient evidence about the probabilities to do so The probable 
error is used m this sense somewhat more than the standard deviation, 
particularly by workers m the exact sciences For instance, the mean 
distance from the earth to the sun is repoited m astronomical publica- 
tions as 93,005,000 miles ± 9000 miles This means that, from an analysis 
of the results of all modern measurements of this distance, by any method, 
the mvestigators have concluded that the probability is 0 50 that the true 
distance to the sun lies somewhere between 92,996,000 miles and 93,014,000 
miles This meaning of the standard deviation and the probable error will 
be useful to us m this chapter foi stating the range of reliability of any 
conclusions which we draw about a universe from a study of a sample. 

3. TABLE FOR SAMPLING EXPERIMENTS 

In the following paragraphs we will develop a set of formulas for finding 
the standard deviation of the sum of two variates, of the difference of two 
variates, and of the arithmetic mean of any number of variates. These 
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formulas are very remarkable, m fact many students feel that they are a 
little uncanny m their ability to pi edict the standard deviations of distri- 
butions which do not yet exist ? In order to provide you with an oppor- 
tunity to work with these foimulas, several sampling tables are included 
here, and it is strongly urged that you carry out for yourself the experi- 
ments described 

Table 10-3-1 contains 300 numbers which form a noimal distribution, 
with an arithmetic mean of 70 5 and a standard deviation of 4 87 The 
300 numbers are tabulated m a landom order Samples may be chosen 
^from this table m either of two ways First, you may make a set of 300 
slips of paper and write one of the numbers on each slip, and then, to 
draw a sample, you can mix the slips thoroughly and take the required 
number of slips from the top of the pile Secondly, you can use the table 
as it is printed and choose the required set of numbers according to any 

Table 10-3-1 Experimental Sampling Table 


X = 70 5 cr x — 4 87 


66 

77 

78 

71 

75 

75 

76 

64 

79 

69 

79 

75 

68 

71 

78 

70 

67 

73 

62 

70 

77 

69 

77 

73 

68 

63 

71 

73 

68 

68 

68 

70 

74 

76 

69 

71 

65 

64 

63 

69 

78 

70 

78 

66 

62 

70 

62 

75 

81 

74 

68 

72 

67 

71 

73 

75 

71 

66 

76 

70 

61 

68 

72 

63 

69 

77 

70 

70 

60 

65 

71 

72 

73 

68 

82 

67 

73 

76 

66 

76 

69 

70 

71 

72 

61 

72 

75 

63 

68 

70 

79 

71 

73 

62 

75 

69 

69 

70 

75 

69 

65 

69 

73 

75 

71 

64 

59 

67 

66 

79 

69 

69 

74 

68 

72 

76 

73 

64 

70 

63 

72 

74 

72 

70 

67 

74 

74 

74 

75 

77 

65 

69 

70 

66 

70 

73 

74 

68 

69 

74 

69 

67 

73 

74 

75 

69 

67 

68 

70 

63 

71 

78 

66 

71 

70 

64 

71 

68 

67 

67 

76 

70 

74 

71 

66 

73 

73 

66 

67 

67 

71 

71 

77 

67 

71 

71 

65 

83 

65 

69 

70 

65 

68 

74 

80 

80 

77 

78 

68 

65 

79 

64 

66 

76 

70 

75 

66 

71 

64 

73 

78 

72 

66 

69 

75 

65 

59 

76 

72 

67 

65 

69 

75 

72 

72 

62 

80 

66 

81 

79 

76 

73 

74 

74 

81 

74 

63 

72 

66 

72 

69 

71 

66 

61 

65 

70 

72 

67 

67 

77 

63 

80 

69 

71 

67 

67 

70 

74 

74 

76 

68 

68 

66 

65 

74 

72 

58 

72 

61 

70 

73 

71 

68 

68 

78 

73 

69 

72 

72 

71 

73 

64 

67 

82 

60 

76 

70 

72 

64 

62 

75 

72 

76 

60 

75 

74 

73 

67 

72 

73 

77 

64 

69 

72 

68 

77 

71 

68 

65 

73 


previously selected pattern; for example, you may select the first number 
in the first column, the second number m the second column, the third 
number in the third column, and so forth, or you may select the fifth, tenth, 
fifteenth, twentieth, and so forth from the numbers m the table The 
pattern of selection should of course be chosen m advance, m order to avoid 
the possibility that one might be subconsciously selecting larger than 
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average numbers or otherwise non-random numbers As you perform the 
experiments, record all your results, since some of the experiments will 
demonstrate or verify several of the standard formulas simultaneously. 

4. STANDARD DEVIATION OF A DIFFERENCE 

To find this experimentally, select about twenty pairs of n um bers at 
random and subtiact the first from the second of each pair This will 
give you a new distribution of twenty numbers, some of which will be 
positive and some negative Compute the standard deviation of this new 
distribution, and see how it compaies with the standard deviation of the*^ 
original distribution Do you, at this pomt, expect it to be larger, or 
smaller, or about the same? 

In order to discuss a specific result of this experiment, let us carry it 
out for the following sample Subtract each number in the second column 
from the corresponding number m the first column Thus the first differ- 
ence m our sample is 66 minus 77, or —11, and the following numbers are 
3, -2, 8, -7, -6, 8, 5, -2, -1, -7, 0, 5, 9, -4, 11, -17, 14, 9, and 1. 
The arithmetic mean of these twenty differences is 0 8. The expected or 
most likely value for this mean is zero, and the fact that we did not obtain 
zero m the experiment is a demonstration of the fact that conclusions 
diawn from a sample cannot be expected to apply exactly to the universe. 

Our chief interest m this experiment, however, hes in the standard 
deviation of these twenty differences The twenty numbers are scarcely 
worth grouping into a frequency tabulation, and we use formula 4-4-1, 
which gives us 

= V6L8 - 0 64 = 7.82 

Thus we conclude from this sample that the standard deviation of the 
difference between two random numbers in the table is 7 8 and that the 
mean of such a difference is 0 8 This is an experimental determination, 
to be compared with a theoretical value to be computed later In the 
language of probability it asserts that if a pair of numbers is chosen from 
this table at random, the probability is 0 68 that the difference will be 
within 7 8 units of 0 8, that is, that it will be between —7 0 and 8 6 

It will be noted that the standard deviation of the difference between 
two variates is somewhat larger than the standard deviation of the original 
table of variates, it is in fact about one and a half times as large. Our 
primary purpose is to find a way to predict this standard deviation from a 
theoretical formula, so that we will have a rapid and accurate measure of 
the uncertainty of a difference between two quantities, when their separate 
uncertainties are known This can be achieved as follows 

If we denote the first of the variates by x, and the second by y , then the 
difference between the two is x — y, and the standard deviation of this 
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difference can be denoted by a x - u If we apply oui basic computational 
formula 4-4-1 to this, we have 

<r*-v = ^(x ~ y ) 2 - (x - y)“ 

or &x-y = ~ 2 xy + y 2 ) — (x - yf 

or, removing parentheses and regrouping, 

= V ( x 2 - x 2 ) + (; y 2 - y 2 ) - 2 (xy - xy) 

^*5Ve recognize x —x 2 as the square of <r x , and y 2 —y 2 as the square of <j y 
The factor xy —xy is a measure of the correlation between x and y, and 
we recognize it as the numerator of the standard expression for r m equa- 
tion 9-6-1 If we solve 9-6-1 for xy —xy, we see that it is equal to r xv <r x a v , 
where we have added a subscript to r to avoid confusion Making these 
substitutions, we have 

cr x ~ u = VZTv" 2r xv cr x <r y (10-4-1) 

In the experimental example there is no correlation between the first 
member and the second member of a pair, m othei words, the value of the 
first does not have any systematic influence on the value of the second, 
and r xv is therefore zero This situation occurs so frequently that it is 
convenient to have a separate formula for it 

a-*.* = Vo* + (No correlation between x and y) (10-4-2) 

In words, this formula says that the standard deviation of a difference 
between two independent variates is the squaie root of the sum of the 
squares of their separate standard deviations In our experimental exam- 
ple, the standard deviation of the first variate is simply 4 87, as is also that 
of the second, and the standard deviation of the difference between the two 
is therefore 

<r x _„ = V4 87 2 + 4 87 2 = 6 89 

Thus the result predicted by the formula for our experimental standard 
deviation is 6 9, while the actual result of the experiment is 7 8 The 
theoretical value is the standard deviation of the universe of all possible 
differences between numbers in the table, while the experimental result 
is the standard deviation of a sample of twenty such differences If we 
took larger and larger samples upon which to base our experimental stand- 
ard deviation of a difference, we would, on the average, come closer and 
closer to 6 89 The students who will perform this expenment with a 
number of different samples* will discover that the observed standard 
deviations will cluster around 6 89, and that a result based upon a large 

*Such a set of experiments is valuable, but laborious If an entire class is studying 
this material simultaneously, it is suggested that the work be divided, each member 
choosing his sample in a diffeient way 
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sample is likely to be closer to 6 89 than one based upon a small sample 

Before we proceed to the standard deviations of other quantities, let 
us study some illustrations of equations 10-4-1 and 10-4-2 

(1) In a golf tournament between two rival clubs, the members of one 
club are matched at random with members of the other club If the 
average score of club A is 97, with a standard deviation of 16, and that of 
club B is 93, with a standard deviation of 12, what will be the average 
margin of victory of club B over club A, and what will be the standard 
deviation of the margin of victory? What fraction of the matches should 
the poorer club expect to win? 

Answer The expected average margin of victory will obviously be 
97 — 93, or 4, an d the stan dard deviation of this average will be, from 
formula 10-4-1, y/ 16 2 + 12 2 or 20 Thus, for 68 per cent of the games, 
we should expect the margin of victory for Team B to be between 24 and 
minus 16, where of couise a negative maigm of victory indicates defeat 
for Team B. To answer the remainder of the question, we must use the 
normal curve to find what fraction of the differences will be negative. 
In other words we wish to know what fraction of a distribution will be 
less than zero if the mean of the distribution is 4 and the standaid devia- 
tion is 20 If we convert zero to t units, remembering that the arithmetic 
mean is 4 and the standard deviation is 20, we find that t is minus 0 2 
We look up this value of t in the normal curve tables and find that the 
corresponding value of area is 0 0793 Thus we expect 0 079 of the total 
games to end m victories between zero and 4, and of course we expect 
0 50 to end m victories greater than 4 This leaves 0 421 for the percentage 
which should end m victories less than zero, m other words, in defeats for 
Team B. Team A can therefore be expected to win about 42 per cent of 
the games 

(2) In a glass factory, it is necessary to take the hot glass from one room 
to another in which the temperatuie may be different A sudden cooling 
of 10 degrees or more may crack the glass The mean daily temperature 
in the first room is the same as that in the second, but the standard devia- 
tion of the temperature m the first room is 8 degrees and that in the second 
room is 6 degrees What proportion of the products should the manu- 
facturers expect to lose by cracking? 

Answer. It would be a serious blunder to apply here the method used 
for the preceding problem It is extremefy unlikely that the temperatures 
m the two rooms are independent, because both are likely to be influenced 
by the same factors, such as outside temperature or the effectiveness of the 
heating system The problem can be answered only if we know the co- 
efficient of coi relation between the temperatuie m the first room and 
that in the second If, for example, this coefficient is 0 80, then we would 
have, from equation 10-4-1 

= V6 2 - 2X6X8X0 80 + 8 2 = 4 8 
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To find what fraction of the differences should be greater than 10, we 
convert 10 to t units as before, and find that t is 10/4 8, or 2 08 The area 
corresponding to this value of ns 0 481 Of the 0 50 which we expect to 
be gi eater than the mean, this leaves 0 019 which will deviate from the 
mean by more than 10 In other words, about 2 per cent of the products 
can be expected to be lost by cracking 

PROBLEMS 

1 In Table 1-4-1, <r M is 13 5, a L is 14 7, t M l is 0 32, M is 71 3, and L is 70 5 
^Compute the standard deviation of the difference between a student's mathematics 

grade and his language grade If an entry m Table 1-4-1 is picked at random, 
what is the probability that this difference will be larger than 10? Test your 
answer by making several random selections from the table. 

2 For the data m Problem 1, Article 12, Chapter 9, compute the average 
difference between the ages of husband and wife, and the probable error of the 
difference. What is the probability that a husband chosen at random will be at 
least ten years older than his wife? 

5. STANDARD DEVIATION OF DIFFERENCES BETWEEN 
CORRELATED VARIATES 

To illustrate the application of the formulas for correlated data, we must 
use a table of pairs of numbers m which there is a relationship between the 
first and the second Table 10-5-1 contams such a set of pairs We can 


Table 10-5-1 Sampling Table of Correlated Data 


X 

y 


X 

V 

X 

V 

X 

y 

X 

y 

54 

32 


49 

32 

55 

31 

51 

30 

53 
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50 

29 
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27 

50 

31 

51 

30 

51 

31 

46 
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31 

50 

30 

53 

31 

52 

30 

57 

34 


55 
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55 

34 

49 
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33 


49 
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48 
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30 

47 
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46 

28 
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33 

49 

30 

45 

28 

49 

31 

54 

33 


52 

29 

50 

30 
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32 

47 

32 

52 

30 


48 

28 

59 

35 

51 

31 

48 

30 

54 

33 


48 

29 

47 

28 

56 

32 

42 

26 

56 

33 


44 

28 

51 

31 

50 

31 

46 

29 



X 

= 5C 

1 5 

<r x = 3 

83 












r xv = 

0 837 





y 

= 30 5 

(Ty = 1 

94 






experiment with this table in either of two ways If we form a set of 
diffeiences between each x and its corresponding y , we must use the formula 
foi coi related data 
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<r*-v = V3 83" + 1 94* - 2(3 83) (.1 94) (0 837) = 2.45 

If, on the other hand, we form a set of differences between each x and 
some other y chosen at random , then there is no correlation and we must 
use the formula for uncorrelated data 

o-,-, = VS 83 2 + 1 94 2 = 4 29 

In both cases the average value of our differences will be (except for 
sampling errors) the diffeience between the average value of x and the 
average value of y, that is, 50 5 — 30 5 or 20 0 

Let us test this conclusion by forming a set of differences between the*^ 
first twenty pairs of numbers, reading down the columns The first 
difference is 54 — 32, or 22, the next is 21, and so forth. The mean value 
of x — y is 20 10, and the standard deviation is 2 28, which is m satis- 
factory agreement with the predicted value of 2 45 

To test the prediction that the standard deviation will be about 4 2$ if 
we choose the y’s at random with relation to the x’s, let us subtract each 
y from the x preceding it m the table Since the pairs of numbers were 
thoroughly mixed before tabulating, this will give us a random pairing of 
x’s and y’s We have then the series of numbers beginning with 54 —29 = 

25, 50 — 29 = 21, 46 — 34 = 12, and so forth The mean of the twenty 
differences is 20 15 and the standard deviation is 4 49, thus confirming the 
predicted value of 4 29 and demonstrating the fact that a random selec- 
tion will give us a different standard deviation from a systematic selection 
The reader is urged to make various selections himself to verify these 
formulas and to investigate the limits of their accuracy of prediction. 

PROBLEMS 

Table 9-11-1 shows the scores made by twenty-eight students on a given exami- 
nation m twelve minutes (y) and m fifteen mmutes (; x ) The values of x, y , <r x , 

<r v , and r xv can be obtained from the data at the bottom of Table 9-11-4 

1 Fmd the mean of the additional scores made by the students in the last three 
mmutes, that is, of the differences x — y 

2 Fmd the standard deviation of these differences 

3 Compute the probable error of these differences 

4 Test this probable error by selecting a few pairs at random from the table 
and computing the differences between x and y Approximately half of these dif- 
ferences should differ from the average value of x — y by less than your probable 
error, and the other half should differ from the average value by more than the 
probable error. 


6. STANDARD DEVIATION OF A SUM 

Let us again begin our discussion with an experiment Select twenty 
pairs of numbers from Table 10-3-1 as before, but this time add the mem- 
bers of each pair instead of subtracting them Now compute the standard 
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deviation of the distnbution consisting of these twenty sums and compare 
it with your earlier lesults Do you expect it to be larger than, or smaller 
than, or about the same as the standard deviation of Table 10-3-1 itself? 
How do you expect it to compare with the standaid deviation of a differ- 
ence which we obtained m Article 5? 

Again we will describe such an experiment based upon a specific sample 
and leave it to the readei to peifoim similai experiments based upon other 
samples Let us add each of the numbeis m column one to the correspond- 
ing numbers m column two This gives us a set of twenty numbers be- 
~**rginning with 143, 137, 138, and so foith The mean of these numbers is 
139 2, and the standard deviation is 7 15 The leader may be surprised 
to note that this standard deviation is about the same as that for the 
difference between any two numbers in the table 

If we repeat the derivation in Article 4, replacing x — y by x + 2/, we 

obtain 

<t X 4v — VV* + crl + 2r xy <j x cr u (10-6-1) 

or, if theie is no con elation, 

a x+y = V crl + crl (No con elation between x and y) (10-6-2) 

which is identical with the standard deviation of the difference The 
expected result of this experiment is therefore also 6 89 
To complete the experimental verification of these formulas, let us 
choose twenty x’s at random from the table of correlated data (10-5-1) 
and add them to their corresponding y’s If we choose the first twenty 
pairs as before, we have the sums 86, 79, 75, and so forth, with a mean 
and standard deviation of 81 4 and 5 85 Since these pairs are correlated, 
we use formula 10-6-1, obtaining for the theoretical standard deviation 

o- I+v = VS 83 2 + 1 94 2 + 2(3 83X1 94) (0 837) = 5 56 

as compared to 5 85 for the experimental result Now let us pair the y J s 
with the x’s at random, by addmg each y to the x preceding it m the table, 
thus destroying the correlation We then have the twenty numbers 83, 
79, 80, and so forth, with a mean and standard deviation of 81 35 and 4 36 
To find the theoretical values, we use formula 10-6-2, which gives us as 
before <r x+v = 4 29 


PROBLEMS 

1 A coffee packaging machine automatically weighs and packages coffee in 
1-pound bags There is a little random variation m weight from bag to bag, and 
the standard deviation of the actual weights is 0 02 pound There is a small 
demand for coffee m 2-pound bags, and the owner plans to readjust the machine 
so that it operates twice each time such a bag is presented How large should he 
expect the standard deviation of the weights of the 2-pound bags to be, assummg 
that there is no correlation between successive operations of the machine? 
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2 Upon further investigation of the coffee packaging machine, it is found that 
part of the variation m weight is due to the retention of a small amount of coffee 
m the machine, so that a slightly underweight bag is frequently followed by a 
slightly overweight one There is, m other words, a correlation between the amount 
of coffee dispensed m any operation with the amount dispensed m the previous 
operation, and the correlation coefficient is — 0.40 How should the answer to 
Problem 1 be modified? 

3 In Table 9-11-1, what is the standard deviation of the sum of the two scores? 

Compute the probable error of such a sum and test it by selecting several sums 
at random to see if approximately half of them are within one probable error of 
the expected value m 

4 Derive equation 10-6-1 in detail 

7. STANDARD DEVIATION OF ARITHMETIC MEAN 

Let us now perform an experiment to discover the nature of the distri- 
bution of a set of means of samples chosen from our universe of numbers 
m Article 4 This is a very important experiment, but since it is lengthy, 
it is suggested that the reader perform it m cooperation with other 
students, each student taking a separate part of the work 

(a) Choose twenty pairs of numbers fiom Table 10-3-1 at random, 
average each pair, then find the mean and standard deviation of the re- 
sulting distnbution of means As an illustrative example, the mean of the 
first number m the first column (66) and the first number m the second 
column (77) is 71 5; the mean of the two numbers below these (70 and 67) 
is 68 5, and if we continue, reading down the first two columns m this way, 
we obtain twenty means of pairs continuing 69 0, 66 0, and so forth The 
standard deviation of these twenty means is 3 58. Thus we see that the 
standard deviation of the mean of two variates is smaller than the standard 
deviation of a single variate, but not related to it in any simple or obvious 
way. 

(b) Choose twenty sets of five numbers at random, average each set, 
then find the standard deviation of these twenty means of five To con- 
tinue our illustrative example, let us use the left-hand five columns m 
the table as a sample and obtain our twenty means by averaging the 
variates in each row The first result will be the mean of the first five 
numbers in the top row, (66, 77, 78, 71 and 75), which is 73 4, the mean 
of the second row is 68 4, and so forth The standard deviation of this 
group of twenty means of five is 1 89 We see that the standard devia- 
tion of a mean of five is somewhat smaller than that of the mean of Two 

(c) Repeat the expenment, choosing now ten numbers in each mean 
Continuing our illustrative example, let us now choose the variates in the 
first ten columns, and average each row of this group The results are 
73 0, 70 6, and so forth, with a standard deviation of 1 56 We see that 
the standard deviation continues to decrease, but rather slowly 

(d) Repeat the experiment, choosmg now fifteen numbers in each 
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mean. For our illustrative example we must use the entire table to get 
twenty means, and we obtain each mean by averagmg an entire row 
The means are now 73 4, 69 9, and so forth, and the standard deviation 
of the group of means is 1 23 
Now let us tabulate our results 

Number of Variates Standard Deviation 
m Each Mean of Mean 

4 87 
3 58 
1 89 
1 56 
1 23 

We see that the standard deviation of a mean depends strongly upon the 
number of variates included m the mean, but that the exact relationship 
between the two is not obvious 

To find the theoretical relationship between the standard deviation of 
a mean and the number of variates included m the mean, let us study first 
some preliminary principles 

(1) The formula for the standard deviation of a sum of two uncorre- 
lated variables can be extended to mclude the sum of any number of vari- 
ables This can be proved by successive applications of equation 10-6-2 

Cx + v+z = <T(x + i ,)+z = V <Tx+v + <£ — \<7x + °v + <T* 

or, m general, 

+*n = VVi + <t\ + * + <Jn (10-7-1) 

where we have used x Y , x 2 , x N , to denote N variables, and , <r 2 
* <t n , to denote their standard deviations 

(2) If we form the sum of N random variates, all from the same table, 
then there is no correlation between them, and the above formula is appli- 
cable Furthermore, the standard deviation of each one will be simply 
the standard deviation of a single variate, and we have 

C2x = *Wi + (72 + + CTN ~ VM = VN a x (10-7-2) 

or, the standard deviation of the sum of N similar variates is equal to the 
standard deviation of a single vanate times the square root of the number 
of variates 

(3) The standard deviation of any constant, C, times a variate, v, is, 
by 4-4-1, 

<Tc, = - CV 

This reduces simply to 


1 

2 

5 

10 

15 


<7c, = V(?V - S 2 ) = Ca, 


(10-7-3) 
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or, the standard deviation of any constant times any variate equals the 
constant times the standard deviation of the variate. We have already 
used this result m a slightly different form in equation 4-5-7 

(4) We are now ready to assemble these preliminary conclusions to 
find the formula for the standard deviation of the mean - We can write 
this standard deviation m the form 


O x — O’ (1/A) 2x 

and then, using our third preliminary conclusion (10-7-3) we can place 
the factor 1/N, which is a constant multiplier, m front 

<r x = {l/N)<j Zx 

Using the second preliminary conclusion, we can replace by ’s/N o x , 

<r* = (1 /N)VN a x 

or, combining the factors containing N, 


Ox 


Vn 


(10-7-4) 


This is an important and far reaching conclusion In words, it is The 
standard deviation of the arithmetic mean is equal to the standard deviation 
of a single variate , divided by the square root of the number of variates in- 
cluded m the mean 

If we apply this formula to our expenmental data, we have the follow- 
ing results 


Number of 

Experimental 

Theoretical 

Variates 

Standard 

Standard 


Deviation 

Deviation 

1 

4 87 

4 87 

2 

3 58 

3 44 

5 

1 89 

2 18 

10 

1 56 

1 54 

15 

1 23 

1 26 

300 

— 

0 28 


In each case the agreement between the experimental and the theoretical 
values is fairly good It should be remembered that the theoretical value 
is the number which we expect the experimental value to approach as the 
sample size is mcreased If, foi example, we had used fifty groups of five 
numbers to find the standard deviation of the mean of five variates, the 
resulting experimental standard deviation would probably have come 
closer to the theoretical value than it did with a sample of only twenty 
groups 

The importance of the above formula can be seen if we consider that 
the statistician generally wishes to find the standard deviation of the mean 
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of all of his variates, in order to estimate the uncertainty of the mean, and 
that the experimental method will not give him this, but will give him 
only the standard deviation of a subgroup 

The use which is generally made of the standard deviation of the mean 
is an indirect one If a scientist measures the percentage of impurity m 
twenty-five samples of a substance and finds that the average percentage 
is 3 46 and the standard deviation of his measures is 0 40, he can use equa- 
tion 10-7-4 to find that the standard deviation of the mean of 25 measures 
is 0 08 The scientist is seeking the tiue value of the percentage of lm- 
^*purity, and he assumes that this true value is the mean of the universe of 
possible measurements of it Let us call this unknown mean of the uni- 
verse X Now let us consider the universe of possible means of twenty- 
five drawn from this first universe This universe of means will have a 
mean of X and a standaid deviation of 0 08, and the piobability is 0 68 that 
any mean of twenty-five chosen at random will be within 0 08 of X There- 
fore the piobability is 0 68 that our sample mean of 3 46 is within 0 08 of X, 
or, if we mveit this, the probability is 0 68 that X is within 0 08 of our 
sample mean of 3 46 In other woids, the probability is 0 68 that X lies 
between 3 38 and 3 54 While we do not know the value of X, we can com- 
pute the likelihood that it is between any given limits We can assert, for 
example, that since the area eoriespondmg to t = 3 is 0 4987, the prob- 
ability is 0 9974 that X is between 3 46 — 0 24 and 3 46 + 0 24, m other 
words, we can assert that X is almost ceitamly between 3 22 and 3 70 
We can set wider limits if the practical situation requires a still more strin- 
gent level of probability 

In drawing such conclusions, the scientist makes a far reaching and some- 
times dangerous assumption He assumes that the only errors of measure- 
ment are random, that is, that they are just as likely to be positive as 
negative If a systematic error is present, as it might be for example if 
his instruments are out of adjustment, then he will draw a totally erroneous 
conclusion about the likelihood that the true value is within any given 
limits The random errors can be controlled by statistical analysis, but 
the systematic errors must be detected by the alertness and ingenuity of 
the investigator 


PROBLEMS 

1 The ratio of the weights of bromine and hydrogen which combine to form 
hydrobromic acid was determined experimentally The results of ten independent 
determinations are as follows 79 2863, 79 3055, 79 3064, 79 3197, 79 3114, 79 3150, 
79 3063, 79 3141, 79 2915, and 79 3108 * (a) What value should the investigators 
report for the result of their investigations? (b) What is the standard deviation 
of this value? (c) What is its probable error? (d) Write the value with its probable 
error as it would be reported in scientific literature (e) In the light of these 

*Weber, Bulletin of Bureau of Standards, Volume IX, page 131 
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measurements, how likely is it that the true value of this ratio is less than 79 3000’ 

2 Answer illustrative question B near the end of Article 1 

3 For the data m Table 1-4-3, what is the probability that the mean of a sample 
of five wires will be below 202 pounds’ That the mean of a sample of twelve 
wires will be below 202 pounds? 

4 Using the normal curve theory, and your answer to Problem 3, Article 10, 
Chapter 6, answer all problems VI, VII, and VIII of Article 4 m Chapter 1 

5 Using equation 10-7-4, answer Question 12 of Article 4, Chapter 1 What 
would prevent him from obtaining unlimited accuracy by increasing the number of 
observations indefinitely? 


8. SUMMARY 

In many statistical problems, the investigator studies a relatively small 
body of data, and from it he draws conclusions about a much larger body 
of similar data The small body of data under study is called the sample , 
and the larger body from which it was drawn is called the universe Be- 
cause of the random variation from one sample to the next, the properties* 
of any sample can be expected to differ a little from the corresponding 
properties of the universe When we estimate the properties of the uni- 
verse from the properties of a sample, we therefore usually introduce an 
error The central objective of this chapter is to draw some conclusions 
about the probable sizes of these errors of estimation, and to show how 
much reduction m their size is to be expected if a larger sample is used A 
second objective, closely related to the first, is to predict the range of 
variation from one sample to the next, when the properties of the universe 
are known 

The theoretical standard deviation of a property of a sample is the 
standard deviation which we would expect if we could collect a large 
number of samples, measure the required property for each one, and 
form a frequency tabulation of the results For example, the standard 
deviation of the arithmetic mean of a sample containing N variates is 

_ <?x 

Vn 

If, for instance, a large number of samples, each containing 100 variates, 
are taken from a universe which has a standard deviation of 25, and the 
mean of each sample is computed, these means will form a distribution 
which will have a standard deviation of approximately 2 5 It follows 
from the theory of the normal curve that if any one sample mean is chosen 
at random, the probability is 0 68 that it will differ from the mean of the 
universe by less than 2 5 

An inversion of this argument is needed for us to determine the re- 

*Some “properties” of a sample are its mean, its standard deviation, its skewness, 
and so forth 



212 


INTRODUCTION TO THE THEORY OF STATISTICS 


[CH 10 


liability with which we can deduce the properties of the universe when we 
know only the propeities of one sample If the mean of one sample is 34, 
and the probability is 0 68 that this mean differs from the mean of the 
universe by less than 2 5, then it is obvious that the probability is 0 68 
that the mean of the universe is between 31 5 and 36 5 In this way the 
probability that the mean of the universe will lie between any other given 
limits can be calculated from equation 10-7-4 and the normal cuive tables 
Examples of this computation are shown m Article 7 

The standard deviations of some other quantities are as follows If x 
. *_and y are measures of two properties of a smgle individual (such as height 
and weight of a man) then the standard deviation of a difference between 
x and y is given by 

O'x—y VO"* "4" Gy 2r a . J/ 0' x CT y 

where r xv is the correlation coefficient between x and y. 

if mstead x and y are simply random variates selected from two univeises, 
then there is no cori elation between them and the standard deviation of 
the difference is 

<r x -v — VV* + <rj 

Similar equations (10-6-1 and 10-6-2) give the standard deviation of the 
sum of two variates Examples of the uses of these equations are given 
m Articles 5 and 6 



CHAPTER 

. 11 . 

TESTING STATISTICAL HYPOTHESES 


1. INTRODUCTION 

We have seen that the statistical investigator usually works with a 
sample, drawn m a random way from a much larger universe which is in 
general inaccessible to him, and we have seen also that the objective of 
the investigator frequently is to deduce the properties of the umverse 
Unfortunately it is impossible, from a sample, to deduce the exact proper- 
ties of the universe, because of the presence of the element of chance m 
the selection of the sample In view of this, any specific statement about 
a property of the universe has the status of a hypothesis, which may or may 
not be true It is the purpose of this chapter to explam methods of testing 
the truth of such hypotheses 

It is not in general possible to prove beyond any possible doubt that a 
given hypothesis is true or false, instead we must confine ourselves to a 
discussion of the likelihood that it is true Even here we must usually 
proceed indirectly, smce our mathematical formulation is set up to tell 
us the probability of drawing various sorts of samples from a specific 
known universe , while we wish to know the probability that various sorts 
of universes could have been the parents of our specific known sample In 
particular, if we must choose between two hypotheses about the universe 
from which the given sample was drawn, we must compute the probability, 
under the first hypothesis, that a sample such as the one under study will 
occur, and then compute, under the second hypothesis, another probability 
of the observed sample, and choose between the two hypotheses on the 
basis of these two probabilities If the observed sample would have been 
almost impossible to draw by chance under one hypothesis, it is reasonable 
to reject that hypothesis This principle is sometimes called the principle 
of maximum likelihood A formal statement of it is as follows Between 
two or more competing hypotheses, we must choose the one m the light of which 
the observed sample has the greater probability of occurring, if the hypotheses 
are otherwise equal m merit We have already made use of this principle 
m Article 6 of Chapter 8, m establishing the Principle of Least Squares 
for the lme of best fit 
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2. DIFFERENCES BETWEEN MEANS 

A specific kind of hypothesis which is of great importance m statistical 
analysis arises when two sets of data are being compared The hypothesis 
is that the two sets of data constitute two samples drawn from the same uni- 
verse To see why this is important, let us consider an example 

A doctor suspects that the convalescence period for a given illness can 
be shortened a httle if the patients are given a new drug. He tests this 
hypothesis by givmg the drug to 100 patients (the experimental group) 
^and withholding it from 100 others (the control group) who are otherwise 
comparable to the first group so far as the doctor’s selection can make it 
so A careful record is kept of the convalescent period of all 200 patients 
The durations for the control group are found to vary from seven days 
to thirty-three days, with a mean of 19 7 and a standard deviation of 6 1 
Those for the experimental group vary from eight days to thirty-one days, 
with a mean of 17 1 and a standard deviation of 5 7 Thus there appears 
to be, m the difference between the two means, some evidence for the 
effectiveness of the drug in shortening the duration of convalescence 
However, the difference is so small, and the scatter of each group is so 
large, that it is perhaps possible to believe that the difference arose solely 
by chance, and has nothing to do with the drug Let us state these two 
competing hypotheses exactly 

A The observed difference of 2 6 days is due to the drug and we can 
expect a similar difference to occur between any future treated groups and 
their control groups In othei words, there is a difference between the 
universe of treated patients and the universe of untreated patients 

B The observed difference of 2 6 days is due to chance, and we can 
expect futuie differences between such groups to be sometimes positive and 
sometimes negative, with a most likely value of about zero In other 
words, the two sample means came from the same universe, and the 
observed difference is simply an example of the expected random variation 
between different samples fiom a umverse 
We can test the significance of the difference either by attempting to 
prove that A is true, or that B is untrue Of these two statements, B 
lends itself much more readily to mathematical analysis, and we accord- 
ingly devote our attention to it Let us call it the “null hypothesis ” 
To test this hypothesis, we must begin by making some preliminary 
computations 

First we compute the standard deviations of both of the means, from 
equation 10-7-4 

= 6.1/ VTOO = 0 61 <7 S = 5.7/VlOO = 0 57 

where x is used to indicate the duration for a control patient and y the 
duration for an experimental patient Next we compute the standard 
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deviation of the difference between the two means, by applying equation 
10-4-2 to the above results 

= VO 61 2 + 0 57 2 - 0 835 

To take the next step, we assume as a working hypothesis that the null 
hypothesis is true This does not mean that we believe it to be true, it 
means only that we wish to mvestigate the consequences which would 
result if it were true According to this hypothesis, the universe con- 
sisting of all the possible differences between means of 100 durations has 
an average value of zero and a standard deviation of 0 835. 

If we accept the null hypothesis, we must also accept the conclusion thaA 
in performing our experiment we selected at random one of these possible 
differences and found it to be 2 6 If the probability of having picked so 
large a difference at random is absurdly low, then doubt is cast upon the 
validity of the null hypothesis, upon which this probability is based. If, 
on the other hand, the probability of having obtamed so large a diffeimce 
is high, then the null hypothesis is a reasonable one Let us therefore, 
compute exactly the probability that a difference chosen at random from 
such a universe of diffeiences will be at least as large as 2 6 We convert 
2 6 into t units by subtracting the mean value (zero) and dividing by the 
standard deviation (0 835), we then find the resulting value of t (3 11) in 
the tables and read the corresponding value of area, which m this case is 
0 4991 This tells us that the probability of obtaining by chance a differ- 
ence between zero and 2 6 is 0 4991 The probabihty of obtaining a 
difference between zero and —2 6 is of course also 0 4991 The remaining 
probability, 0 0018, is the probabihty that two means of 100 chosen at 
random will differ by 2 6 or more In other words, if we accept the null 
hypothesis, then we must also accept its consequence, namely, that an 
event occurred even though it had a probability of less than 2 in 1000 of 
occumng It would be unreasonable to accept this consequence of the 
null hypothesis, and we must therefore reject the null hypothesis itself 
Having rejected it, we must accept its alternative, and we therefore con- 
clude that the two samples could not have come from the same universe , or 
that there is a significant difference between the means , or, in our example, 
that the drug does have an effect on duration of convalescence . 

The conclusion which we drew in the above paragraph is not quite an 
absolute one, since there remams the probability of 0 0018 that the observed 
difference could have arisen by chance This probabihty we may consider 
to be, for practical purposes, the probabihty that the null hypothesis is 
correct, and we can state our conclusions exactly as follows The available 
evidence indicates that the probabihty that the drug does not affect the 
duration of convalescence is only 0 0018, while the probability that it does 
affect duration is 0 9982 Since the probability in its favor is so over- 
whelming, we can state flatly that we have proved beyond a reasonable 
doubt that the drug affected the duration of convalescence 
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Let us s umm arize the operations involved m this analysis. 


First step Compute the mean and standard deviation of both x and y 
Second step Compute the standard deviations of x and y from equa- 
tion 10-7-4 

Third step: Compute the standard deviation of x — y from equation 
10-4-2 

Fourth step Compute t from 


x — y _ Diff 

CT £ -y 0"Dlff 


( 11 - 2 - 1 ) 


~ Fifth step Look up the area corresponding to this value of L 
Sixth step Compute P from 

P (of null hypothesis) = 1—2 A (11-2-2) 


This value of P is the probability that a difference as large as or larger 
thaw the observed difference could have arisen by chance, assuming that 
the two samples came from the same universe It is commonly interpreted 
as the probability that the null hypothesis is conect, and if P is so small 
as to be negligible, then the difference between the two means is proved to 
be real 

If the probability is small, but not so small as to be ignored, then we 
must state our conclusions accordingly An example of such a less favorable 
probability is the following A farmer who has been raising one variety of 
tomatoes believes that he might be able to obtain an mci eased yield with 
a new variety He plants nine of the new vanety (the experimental 
group) and nine of the old variety (the control gioup), spacing them altei- 
nately m a row to reduce the likelihood of systematic differences m growing 
conditions of any kind The yields per plant are as shown m Table 11-2-1 


Table 11-2-1 Yield from Two Varieties 


Control Group ( x ) Experimental Group (y) 


18 

13 

12 

14 

17 

13 

16 

12 

13 

14 

16 

16 

16 

14 

12 

15 

18 

12 


The details of the computations are 

First step x = 14 y — 15 
<r, = 2 1 <7„_= 1 8 

Second step cr 5 — 2 1 / y / 9 = 0 7 o* = 1.8/ V9 = 06 

Third step = VO 7 + 0 6 2 = 0 92 

Fourth step t = (15 — 14)/0 92 = 1 09 

Fifth step A = 0 362 

Sixth step P = 1 - 2(0 362) = 0 276 

Thus we see that there is a probability of 0 28 that a difference this large 
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or larger could have arisen solely by chance upon selecting two samples of 
nine from the same universe, and, by implication, this tells us that the 
probability that the null hypothesis is correct is 0 28, which is far too 
large a probability to permit us to reject the hypothesis Our conclusion 
therefore is that no significant difference between the two varieties has 
been proved to exist 

PROBLEMS 

1 Compute the probability that the difference between the mean of Table 
1-4-1 and the mean of Table 1-4-2 is due to chance, upon the hypothesis that the 
two sets of temperatures came from the same universe What is your conclusion 
about the effectiveness of the treatment given the experimental patients? 9 

2 This year’s freshman class, consistmg of 538 men, had an average grade of 
76 4, with a standard deviation of 17 Last year’s class, with 620 men, had a grade 
average of 77 4 with a standard deviation of 19. Is the difference between the two 
classes significant? 

3 The mean height of 1428 men m one geographical region is 66.4 inches, with 
a standard deviation of 2 7 inches, while that of 1193 men in another region is 
66 7, with a standard deviation of 2 5 Would you predict that further samples 
of men from the two regions will show a similar difference? 


3. CONFIDENCE LIMITS 


The conclusions m a study of the significance of a difference between 
means can never be stated m absolute terms, since there always remains 
a residual probability that the difference arose by chance Nevertheless, 
it is frequently convenient to adopt arbitrary limits of significance, and 
several such arbitrary limits are widely used One such set of rules is the 
following 

If P is greater than 0 20, no significance is indicated j 
If P is between 0 05 and 0 20, the difference is probably v 
significant ( 

If P is less than 0 05, the difference is certainly significant / 


These results are frequently expressed m somewhat different terms, which 
the student should be prepared to recognize in statistical reports If P 
is less than 0 05, the difference is said to be “significant at the 0 05 level,” 
if P is less than 0 01, “at the 0 01 level,” and so forth The limits most 
frequently used are 0 1, 0 05, 0 02, 0 01, and 0 001, but the terminology 
can be used to express any “level of significance ” 

Another way of expressing the conclusions in terms of confidence limits 
is to compute only t, and not A or P, and to interpret the results according 
to the following table 

If t is less than 2 5, no significance is proved \ 

If t is between 2 5 and 3 0, the difference is probably r 

significant r (11-3-2) 

If t is greater than 3 0, the difference is certainly sig-l 
nificant / 
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The fact that 11-3-1 disagrees completely with 11-3-2 emphasizes the 
fact that all limits are arbitrary The use of arbitrary limits is an inexact 
way to express our conclusions, and m general it is better to state the 
exact probability that the results could have occurred by chance It is 
obviously ridiculous, for example, to state that when t is 2 5001 the differ- 
ence is probably significant, but when t is 2 4999 there is no evidence for 
significance, yet this is what the use of arbitrary limits compels us to do 
Another reason for stating our conclusions m terms of probability is the 
fact that different situations require different degrees of certainty Suppose 
that a rope manufacturer installs some new machinery and performs a 
statistical analysis to find whether the strength of the rope produced by 
the new machinery is significantly different from that produced by the old 
machinery He finds a small difference, but reports to his customers that 
“it is not significant ” One of his customers resells the rope m the form of 
clotheshnes and is completely satisfied with the conclusion After all, if 
there is one chance in a thousand that a cord will break, the consequences 
are not very serious, and if a rope should break, the irate housewife could 
be easily compensated for the resulting damage Another customer buys 
the lope for use m the manufacture of parachutes Here the consequences 
of a break are much more serious, and the user must know whether the 
probability of a break under the proposed load is one m a thousand or one 
m a million The purchaser in this case would insist rightly upon a far 
smaller probability that the rope would break under the proposed load 
than would the first purchaser The manufacturer should theiefore state 
his conclusions exactly m terms of the probability that the null hypothesis 
is correct and leave it to the user to decide whether the probability is low 
enough to meet his needs 


PROBLEMS 

1 Restate your answers to problems 1 and 3 m the preceding section m terms 
of “levels of significance,” and then m terms of the criteria 11-3-1, and finally m 
terms of the criteria 11-3-2 

2 Write a brief statement comparing the value of criteria 11-3-1 and 11-3-2 
with each other, and both with the use of levels of significance 

4. DESIGN OF EXPERIMENTS 

In Article 2 we considered the question of whether the data m Table 
11-2-1 proved that the experimental plants shown at the right of the table 
differed significantly from those shown at the left We concluded that 
the difference between the two means was too small to be statistically 
significant, and that no difference was proved to exist 

This does not mean that no difference between the two varieties exists 
It means only that if a difference does exist, the sample size chosen by the 
farmer was too small to reveal it The farmer may perhaps reason as 
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follows “ A difference of yield of one pound pei plant is large enough to 
justify replacing the old variety by the new one, since the two varieties 
are alike m all other respects It is therefore worth experimenting further 
to discover whether this apparent difference is real or is only an accident 
of sampling How many experimental plants must I put m next year m 
order to settle the question decisively?” 

To answer this question, we must first decide exactly how “decisively” 
the question must be answered Suppose that we decide that it should 
be answered at the 0 05 level of significance This means that we wish 
to design an experiment m such a way that, if such a difference between 
the means continues to appear, it will be proved to be significant at the* 
0 05 level To find the number of experimental plants necessary, we must 
follow our previous procedure in the reverse order. If 1 — 2A is to equal 
0 05, then A must equal 0 475 From the normal curve tables we find 
that t must equal 1 96 Assuming that there is a real difference of 1.00 
between the two varieties, equation 11-2-1 becomes 


1 96 


1 00 

&x-v 


from which we see that <r £ -. s must equal 0 51 If we insert this m equation 
10-4-2, applying it to the means, we have 

0 51 = *\f + crl 


If we replace a £ by v x /N and a v 
this becomes 

0 51 


by crJN , according to equation 10-7-4, 



The standard deviations of the larger samples which the farmer plans for 
next year will probably not differ much from the standard deviations of this 
year's small samples, and we can replace <r x by 2 1 and <r v by 1 8 The 
above equation can then be solved for N The exact result is 


N = 


2 l 2 + 1 8 2 
0 51 2 


29 


Our conclusion is then that if a true difference of about 1 pound per plant 
does exist, it will require a sample size of at least twenty-nine plants to 
prove it beyond a reasonable doubt 


PRO BLEMS 

1 Suppose that the farmer m the above illustration believes that a proved 
difference of \ pound per plant would be enough to justify the adoption of the 
new variety, but that a smaller difference would not justify a change How 
large should his experimental group be to insure the proof of significance? (Use 
P = 0 05 as before ) 
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2 Suppose that the farmer not only wishes to test for differences as small as 
| pound, but is not satisfied with a result which permits one chance m twenty that 
he is wrong If he insists that the probability of the null hypothesis must be 
reduced to 0 01 for any difference greater than § pound, how many plants should 
he include m his next experiments 

3 In Problem 1, Article 7, Chapter 10, how many observations would be neces- 
sary to reduce the probable error of the mean below 0 001? Below 0 0005? 

4 A preliminary study of ten criminals indicates an average IQ of 89, with a 
standard deviation of 15 For comparable sociological groups, the mean IQ of a 
large number of non-crimmals is known to be 96, with a standard deviation of 12 

^If the difference in mean IQ is real, how many criminals should be included m 
subsequent studies m order to demonstrate the reality of the difference at the 
0 005 level of significance? At the 0 001 level? 


5. HYPOTHESES CONCERNING VARIABILITY 


Tn Article 2 we discussed the procedure for finding whether the differ- 
ence between the means of two sets of data is compatible with the hypothe- 
sis that the two sets came from the same umveise We will now consider 
the problem of finding whether the difference between the standard devia- 
tions of two sets of data is compatible with this hypothesis 

To see why this is useful, let us consider a specific application In a 
factory a bolt-cutting machine turns out bolts with a mean diameter of 
0 2508 inch, with a standard deviation of 0 0017 inch In a routine check 
it was found that a sample of 100 bolts had a mean diameter of 0 2506, 
which is well within the tolerance limits of size, but that the standard 
deviation of the sample was 0 0027 The opeiator accepts this as evidence 
that the machme has become worn to the point of being ei ratio m its 
output Is it reasonable to believe that the machine will continue to turn 
out bolts with the higher variability, or is it more reasonable to believe 
that the apparent high variability was due to chance and to expect future 
samples to have a standard deviation of around 0 0017? Assuming that 
an increase of variability of this size cannot be tolerated, should the 
machine be leplaced? 

To answer this question, we must know the equation for the standard 
deviation of a standard deviation, which we introduce here without 
proof 

'• - v™ ^ 

To apply this to our problem, we must compute the standard deviation 
of the standard deviation of 100 variates, if the standard deviation of the 
universe is 0 0017 From 11-5-1 we have 


ov 


0 0017 
V2 X 100 


= 0 00012 



ART 6] 


TESTING STATISTICAL HYPOTHESES 


221 


Thus we see that the observed standard deviation (0 0027) differs from the 
predicted standard deviation (0 0017) by eight tunes its own standard 
deviation, and the hypothesis that the sample of 100 came from the speci- 
fied universe is totally untenable It is nearly impossible that the difference 
could have arisen by chance, and we must conclude that the machine has 
become defective 

PROBLEMS 

1 In Problem 4, Article 4, we discussed the hypothesis that criminals have 
lower IQ’s than non-criminals, in the mean Do the figures given also indicate 
that the IQ’s of criminals tend to be more variable than those of non-criminals^ m 
Discuss this hypothesis 

2 Assummg that the difference between the standard deviations of IQ’s of 
criminals and non-criminals is real, how many criminals would have to be studied 
to demonstrate the reality at the 0 01 level? At the 0 001 level? 

3 For Table 11-2-1, discuss the hypothesis that the experimental group is less 

variable than the control group *» 

4 For the data m Tables 1-4-1 and 1-4-2, discuss the hypothesis that the ex- 
perimental treatment reduces the variability of temperatures 

5 In Problem 3, Article 2, Chapter 11, would you regard the difference between 
the two standard deviations as significant? 

6. STANDARD DEVIATION OF A FREQUENCY 

In the preceding articles we have studied the methods by which a study 
of a sample can give us mioimation about the range of reliability of quanti- 
ties deduced from the sample The quantities so studied have all been 
functions of the variates In this article we shift our attention away 
from the sizes of the variates and dnect it instead to the frequencies of 
occurrences of vanates of various sizes The purpose of this will be made 
clear by an illustrative example 

Example 1 In a sample of 500 bolts from a day’s production of a bolt-making 
machine, 15 were found to be defective In earlier tests, the average number of 
defective bolts had been about 10 per 500 Would you call m a repair man to 
check the machine or would you attribute the increase to chance? 

To solve this problem, we recall from the chapter on the normal curve, that if p 
is the probability that a given event will succeed, 1 — p is the probability that it will 
not succeed, n is the number of trials, and s is the number of successes, then the 
standard deviation of s is equal to\/ np(l — p), by equation 6-6-1. In the termi- 
nology we have used for frequency tabulations, / is the frequency of occurrence of 
any given class, m other words, the number of “successes” with respect to that 
class, N is the total number of vanates, and f/N is the best guess we can make about 
the probability of a success m the universe from which the sample was drawn 
In other words, equation 6-6-1 becomes 
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or 



( 11 - 6 - 1 ) 


In many cases the frequency of the class m which we are interested is very small 
m comparison with the total number of variates In this case, f/N is nearly zero 
and we can use the approximate formula 


<T f ~ y/ f (if f/N is small) (11-6-2) 


r* 


where = means “is approximately equal to ” For our illustrative problem, we 
make the hypothesis that the day’s production is a random sample from a umverse 
m which 10 bolts per 500 are defective The expected frequency of defective bolts 
m the day’s production is therefore 10, with a standard deviation of *%/ 10 or 
3 16 The observed frequency of 15 is therefore only 1 58 standard deviations 
away from the expected value The normal curve tables tell us that a deviation 
this large or larger can be expected m about 11 per cent of the trials We conclude 
that the difference may easily be due to chance and that there is no justification 
for overhauling the machine 

Example 2 An identical examination was given m two classes In the first 
class, 8 students out of 32 made perfect scores, and in the second class, only 5 out 
of 35 made perfect scores Should you, on the basis of this evidence, expect con- 
sistently better performance from the first class or should you attribute the differ- 
ence to chance? 

Answer As usual, we begin by adopting the hypothesis that the two samples 
came from the same universe, and we investigate the consequences of this hypothe- 
sis The best guess about the hypothetical single universe is obtained by combining 
the two samples, which tells us that 13 out of 67 made perfect scores, or 19 4 per cent 
On this basis we would expect 6 2 perfect scores m the first class, and 6 8m the 
second class To obtain the standard deviations of these predictions, we can- 
not use the approximate equation 11-6-2 because f/N is too large, and we use 
instead equation 11-6-1 The standard deviation of the first predicted frequency 


(6 2) is 




or 2 2, and that of the second is 2 3 The observed fre- 


quencies are both less than one standard deviation from their expected values, and 
we conclude that our hypothesis that the two samples came from the same universe 
is tenable In other words, we conclude that there is not sufficient reason to believe 
that the first class will be consistently better than the second This problem will 
be treated more fully m Section 8 


PROBLEMS 

1 In a state with a population of six million, there were 984 automobile accident 
fatalities last year In an effort to reduce the number of accidents, the traffic 
authorities mcreased their seventy m dealing with violations In the following 
year there were only 951 fatalities, and the authorities claimed to have saved 33 
lives by means of their campaign Comment upon the validity of this claim 

2 A given city usually has about 15 cases of typhoid fever per year, but this 
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year there have been 28 cases Should the health authorities investigate the 
cause of the increase, or should they simply dismiss it as due to the random varia- 
tion to be expected from year to year? 

3 A patient has a blood count of 13,200 This “count” is based upon the actual 
counting of 264 blood cells m a standardized volume Two days later his co un t 
has risen to 13,800, based upon an actual count of 276 cells Should this be re- 
garded as evidence of a change m his condition, or should it be attributed merely 
to random differences between successive samples? 


7. STANDARD DEVIATION OF A PERCENTAGE 

If the data are m the form of percentages the procedure m Article 6 # 
can be shortened somewhat by using a modified form of equation 11-6-1 
If we let P stand for the frequency of any group expressed as a percentage 
of the total variates that is, if we let P = 100//A, then we can rewrite 
equation 11-6-1 as follows The standard deviation of P is 100/ A" times 
the standard deviation of /, since 100/ A is a constant (see equation 
10-7-3), and we have 

100 
<Jp ~~ ~w 



or, if we replace / by PA/ 100 and simplify, 

/f (100 - P) 


a i-7-i) 


As an illustration of this equation, let us consider the following problem. 
On a public opinion survey, 37 per cent of the people polled m a given 
state expressed a preference for a given tax measure In one county of 
that state, m which 1523 people were interviewed, only 32 per cent ex- 
pressed a preference for the measure Can you reasonably predict, on 
the basis of this poll, that the actual election will also show a lower per- 
centage of positive votes m the county than m the state, assuming that 
there is no shift m public opinion between the time of the opinion poll and 
the actual vote? 

To answer this question, we must again begm by formulating a specific 
hypothesis, and we choose the hypothesis which is most easily tested In 
this case we adopt (for testing) the hypothesis that the opinion in the 
county is the same as that in the state, m other words, we assume that the 
percentage of favorable votes in the county would have been 37 per cent 
if all the voters had been polled On this hypothesis, the actual percentage 
of 32 per cent which we observed m the sample differed from the true 
value of 37 per cent solely as a result of chance To test this hypothesis 
we compute the standard deviation of the percentage from equation 11-7-1, 
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it is or * 24 per cent The observed value of 32 is there- 

fore 4 0 standard deviations away from the expected value, and the proba- 
bility that the difference could have arisen by chance under this hypothesis 
is less than 0 0001 We conclude that the hypothesis is untenable, and that 
we can safely predict (ignoring other sources of error) that the actual vote 
m the county will agree with the opinion poll m showing a lower per- 
centage in the county than m the state 

^ PROBLEMS 

1 An instructor m Ohio University gave an examination to a class of 74 members, 
which met at 8 A M He told them that he planned to give the identical examina- 
tion to a section (containing 82 members) which was to meet later m the day, and 
he asked them not to reveal the content of the examination In the early section 
31. men failed, and m the later section only 11 failed The sections had been 
evenly matched m ability on earlier examination Is it more reasonable to believe 
that information was given to the second section by the first, or to believe that the 
difference is due to chance? 

2 In a sample of 100 students, it was found that 13 per cent smoked cigarette 
A and only 11 per cent smoked cigarette B Would you definitely expect cigarette 
A to be more popular on the average m the entire student body? 

3 If you believe that the difference m Problem 2 is significant, how many 
students would you interview m order to be reasonably certain of establishing the 
difference at the 0 01 level of significance? 

4 In the 1946-47 term, the voting record* of the Supreme Court m cases in- 
volving alleged civil rights violations was as follows 


Justice 

For Claimed Right 

Against Claimed Right 

Rutledge 

11 

1 

Murphy 

10 

1 

Douglas 

8 

4 

Black 

8 

4 

Burton 

3 

9 

Jackson 

2 

9 

Frankfurter 

2 

10 

Reed 

2 

10 

Vinson 

0 

12 


Does this indicate a systematic and predictable difference of voting attitude m 
civil rights cases between Murphy and Douglas? Between Douglas and Jackson? 
Between Black and Frankfurter? Between Murphy and Reed? 


*Repnnted from Truman Reshapes the Supreme Court by Irving Dilhard, December, 
1949, by permission of the Atlantic Monthly and Mr Dilhard Copyright by the At- 
lantic Monthly, 1949 
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5 In the 1947-48 term, the voting record was as follows 


Justice 

For Claimed Right 

Against Claimed Right 

Rutledge 

26 

1 

Murphy 

25 

2 

Douglas 

23 

4 

Black 

19 

7 

Frankfurter 

12 

15 

Jackson 

7 

20 

Burton 

6 

21 

Vinson 

6 

21 

Reed 

4 

23 


Is Rutledge's voting record significantly different between the two sessions? 

6 Is Frankfurter’s record significantly different between the two sessions? 
Assuming that such a difference exists, is it more reasonable to attribute it to a 
change m his attitude or to a change m the nature of the cases under considera- 
tion? (Hint The hypothesis that there has been a change m the nature of the 
cases under consideration can be tested by comparing the total votes of the entire 
court m the first session with that m the second session ) 


8. CHI-SQUARE TEST 

The procedure given m Article 6 is adequate for testing any hypothesis 
which predicts the frequency of occurrence m any single class Frequently, 
however, the hypotheses which we wish to test contain predictions about 
a set of frequencies As an example of such a situation, let us consider 
the problem of a set of dice which are to be tested for balance One of 
them is thrown 360 times, with the results shown m Table 11-8-1 Would 
you conclude that the dice are probably defective, or probably not de- 
fective? 

In order to proceed, we must make a definite hypothesis which will 
enable us to predict a set of frequencies The hypothesis that the die is 
defective is of course useless, since it does not lead to a definite set of 
expected frequencies We therefore adopt the opposite hypothesis, that 
the die is perfectly balanced This leads us to a set of predicted frequen- 
cies, equal to 60 for each face Now let us begm by considering the prob- 
ability that the number of sixes will be exactly 58 By equation 11-6-2, 
the standard deviation of the expected frequency of 60 is 7 7. The devia- 
tion of the observed frequency from the predicted frequency m t units is 

00 pjg 

— — j or 0 26, and the probability per t unit of such a value of t is, by 
equation 6-5-1, 
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or 0 39 The probability of occurrence of exactly 58 sixes is obtained by 
multiplying the probability per t unit by the number of t units contained 
in the interval from 57 5 to 58 5, which is simply 1/7 7 or 0 13 The proba- 
bility of obtaining exactly 58 sixes is therefore 0 39 X 0 13 or 0 051. If we 


Table 11-8-1 Results of 360 Throws 


Face 

Frequency 

6 

58 

5 

42 

4 

87 

3 

61 

2 

63 

1 

49 


generalize this, we have, foi the piobability of occurrence of a given fre- 
quency /j , whose predicted value is f p 


p(/ t ) = ch-Htti-v/vr.}' 




where C is used as an abbreviation for the constant multiplier If / 2 is the 
observed frequency of the second class, its probability will be given by 
the same equation with f 2 replacing f 1 , and with the meaning of f v changed 
to indicate the predicted frequency of the second class The piobability 
that the frequency of the first class will be f x and that of the second class 
will be / 2 is the product of the two probabilities If we continue this 
throughout all of the classes, the probability that all of the frequencies 
will be exactly those observed is 


P(/i and f 2 and ) = Ce^ l(fl ‘ fv)a/fv] X Ce~* I(w ’ )V/p] X 

We can simplify this by adding the exponents of e , according to equation 
3-3-1 

P(fi and f 2 and ) = c n e~ ilifl ~ fv) ' /fv+{f *~ fv) * /fv+ ' 1 (11-8-1) 

Let us abbreviate the sum m the exponent by means of a single symbol 

x 2 = 2(f - f v y/f v (11-8-2) 

This symbol is read “Chi-squared ” With it, equation 11-8-1 becomes 


P (of observed set of frequencies) = CV** a (11-8-3) 


This probability, however, is not a satisfactory measure of the degree of 
success of our hypothesis, for two reasons. 

1 Any probability of a set of specific frequencies will be low. The 
probability of a frequency of 58, for example, was found to be only 0 051, 
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although it is obvious that 58 has a higher probability than any other 
frequency except 59, 60, 61, and 62 We are not so much interested in the 
probability of gettmg exactly 58 sixes as we are m knowing how its prob- 
ability compares with the probability of other possible frequencies, that is, 
m whether it is m a region of relatively high probabilities or not In the dis- 
cussion of the reality of a given difference between means, we met this 
problem by computing the total probability that a difference this large or 
larger would occur by chance, m other words, we computed the total 
probability of occurrence of all differences which were less likely than the 
observed difference If we apply the same principle here, we must s um 
equation 11-8-3 for all sets of frequencies which aie less likely than the# 
observed set Since equation 11-8-3 itself measures the likelihood of 
occurrence of a set of frequencies, this means that we must s um it for all 
values of x which are larger than the observed value of x 
2 We must make a distmction between a hypothesis which is inde- 
pendent of the observations and one which is shaped in part by the observa- 
tions It is always possible to contrive a hypothesis which is adjusted to 
fit the sample perfectly if we are permitted to make the hypothesis elaborate 
enough, but the success of the fit should not be accepted as a measure of 
the probability of the hypothesis. For instance, we could make the 
following hypothesis The die is defective m such away that a four comes 
up 24 2 per cent of the time, while the others are all equally likely In 
this way we have taken care of the largest discrepancy and the quality 
of the fit is greatly improved The probability of a large deviation is now 
greatly reduced, and we must modify equation 11-8-3 to express this fact 
If we had adjusted two constants, instead of one, to bemg about an exact 
agreement with the observations, we would have to reduce the probability 
of large deviations still farther To accomplish this modification of equa- 
tion 11-8-3 we must make use of a concept called the number of degrees of 
freedom , which is the number of classes for which we are comparing the 
observed to the predicted frequencies, minus the number of constants in 
the hypothesis which are adjusted to fit the data 

The modifications of equation 11-8-3 to allow for the number of degrees 
of freedom, and the summing of the resulting equation to measure the 
probability of all equally likely or less likely distributions, are too complex 
mathematically for us to discuss here The resulting probability equation 
is veiy complex, and the values of the probability are m practice usually 
obtained from precomputed tables rather than from the equation Such 
a table is contained m Appendix VI To use it, we find the number of 
degrees of freedom m the left-hand column headed “w,” we look across this 
line until we find the value nearest to the observed value of x , and we 
then read at the top of the corresponding column the total probability of 
occurrence of all equally likely or less likely distributions The use of the 
tables is sti aightf orward and simple except for the determination of the 
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number of degrees of freedom. This can be made clear most rapidly by 
means of illustrative examples 

I. Let us first test the hypothesis that the die described in Table 11-8-1 
is correctly balanced The procedure (demonstrated m Table 11-8 2), is 
as follows 

(1) Tabulate the observed and the predicted frequencies of each class 
(columns 2 and 3 m Table 11-8-2) 

(2) If any class contains fewer than five variates, it is advisable to com- 
bine it with an adjacent class (This rule is not applicable to Table 11-8-2 
but is included here for future reference For its use, see the example m 

.Table 11-8-3 ) 

(3) List the differences between each observed frequency and the 
corresponding predicted frequency (column 4 of Table 11-8-2) 

(4) List the squares of these differences (column 5) 

(5) Divide each of these squares by the corresponding predicted fre- 
quency (column 6) 

(6) Add this column, obtaining x 

Table 11-8-2 Chi-Square Test 


X 

/ 

U 



(f-fS/fv 

6 

58 

60 

-2 

4 

0 07 Number of classes 6 

5 

42 

60 

-18 

324 

5 40 Number of adjusted constants: 1 

4 

87 

60 

27 

729 

12 15 (total throws = 360) 

3 

61 

60 

1 

1 

0 02 Degrees of freedom 5 

2 

63 

60 

3 

9 

0 15 X = 19 81 

1 

49 

60 

-11 

121 

2 02 Prob. <0 01 


19 81 


(7) Count the number of constants contained m the hypothesis or the 
predicting equation which have been adjusted to fit the sample exactly 
For the data in Table 11-8-2, the hypothesis is that all faces are equally 
likely, and in order to predict the frequency it is necessary to use the 
fact that the total number of throws must be 360. This number is fitted 
exactly to the data m the sample The number of adjusted constants is 
therefore 1 

(8) Subtract this number from the final number of classes used m 
computmg x\ obtaining n, the number of degrees of freedom In the 
example, there are six classes and one adjusted constant, therefore there 
are five degrees of freedom 

(9) Find this value of n m the left-hand column of the tables; find the 
tabulated value on this row which is nearest to the observed value of x) 
read the probability at the top of this column In our example we find, 
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opposite n — 5, that the tabulated value of x which is nearest to our 
observed value (19 81) is 20 52 From the top of the column we see that 
the probability is therefore between 0.01 and 0 001, and nearer to the 
latter Thus we see that the probability that the observed frequency 
could have occurred by chance if the die were perfectly balanced, plus 
all the probabilities than any other equally likely or less likely frequency 
could have occurred by chance, totals much less than 0 01 Thus if we 
accept the hypothesis that the die is balanced, we are foiced to accept 
with it the conclusion that m selecting our sample we chanced upon an 
excessively unlikely distribution, so unlikely m fact that m scarcely more 
than one trial m a thousand would we expect to obtam this distribution 
or any other distribution of comparable likelihood Bemg unwilling to 
accept the consequence, we are forced to relinquish the hypothesis, and 
we conclude that the hypothesis that the die is properly balanced must 
be rejected 

(10) The probability obtained in step 9 can be regarded as the con- 
clusion of the x test and can be used as it stands as the basis for a practical 
decision about the hypothesis However, some workers prefer to use an 
arbitrary scale such as the following 

( If P is greater than 0 1, the hypothesis is acceptable 
\ If P is between 0 1 and 0.05, the hypothesis is doubtful 
( If P is less than 0 05, the hypothesis is not acceptable 

II As a second example of the use of the chi-square test, let us con- 
sider the data m Table 6-9-1 This frequency tabulation appears to fit 
the normal curve fairly well, and the question arises as to whether the 


Table 11-8-3 Hypothesis of Normal Distribution 


/ 

/» 

f-U 

(f-fvf (/■ 

- /,)■//, 



1 1 

2 

• 6 8 

-1 8 

3 2 

0 47 

Number of classes 

11 

2 j 

10 

9 2 

+0 8 

0 6 

0 07 

Adjusted constants 

3 

15 

16 4 

-1 4 

2 0 

0 12 



27 

25 2 

+ 1 8 

3 2 

0 13 

(*, C, N) 


37 

31 2 

+5 8 

33 6 

1 08 



30 

33 4 

-3 4 

11 6 

0 35 

Degrees of freedom 

8 

34 

30 4 

+3 6 

13 0 

0 43 



18 

23 3 

-5 3 

28 1 

1 21 

Probability of an 


13 

13 4 

-0 4 

0 2 

0 01 

equally or less 


10 

8 2 

+ 1 8 

3 2 

0 39 

likely set of 


4 

) 




frequencies 

>0 5 

2 

> 5 9 

+ 1 1 

1 2 

0 20 



1 

) 


2 

X = 

: 4 46 
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universe from winch this sample was drawn is or is not normal, within the 
limits of observation We must begin as usual by making a specific 
hypothesis which will permit the prediction of a set of frequencies, for this 
reason we adopt the hypothesis that the distribution is normal rather than 
that it is not The chi-square test of this hypothesis is shown m Table 
11-8-3 

To carry out the test, we begin by computing the predicted frequencies 
for each class This step has been carried out m Table 6-9-1 and will not be 
repeated heie We next combine the small classes as shown m Table 
11-8-3, so that there will be no predicted frequencies less than five Finally, 
Ve must count the adjusted constants, in order to obtain the number of 
degrees of freedom 

To count the adjusted constants, let us review the procedure of fitting 
a normal curve To compute the predicted frequencies we first convert 
the values of x into t units from the equation t = (x — £)/<r In this 
equation, both x and a have been computed from the observations and are 
therefore adjusted constants Later, when the probabilities per class have 
been obtained, these aie multiplied by N, the total number of observations, 
m order to obtain the predicted frequencies Since N is determined by the 
sample size, it is also an adjusted constant The total number of adjusted 
constants is therefore three (x, <r, and W), and the number of degrees of 
freedom is therefore 11 — 3, or 8 We see that the final probability is 
larger than 0 50, from which we conclude that the hypothesis of a normal 
distribution is acceptable 


PROBLEMS 

1 Using the chi-square test, answer question XI, Article 4, Chapter 1 

2. In the data m Problem 4 of the preceding article, test the following hypothesis 
The Justices are all alike m their attitudes toward civil rights cases, and the differ- 
ences m their voting records are due to chance variations rather than to systematic 
and predictable differences of attitude (Hint Assign a half-vote m cases of 
abstention, and consider only the “for” votes ) 

3. Test the hypothesis that the attitudes of all the Justices toward civil rights 
cases have remained unchanged between the 1946-1947 session and the 1947-48 
session (Hint Use the percentage of “for” votes m the first session as a basis 
for predicting votes m the second session ) 

4 Test the hypothesis that the attitudes of all the Justices except Frankfurter 
have remained unchanged between the two sessions 

5 Test the hypothesis that the two sets of temperatures m Tables 1-4-1 and 
1-4-2 came from the same universe (Hint Compute the percentages of tempera- 
tures m the various classes m the first set, and use these as a basis for predicting the 
frequencies m these classes m the second set ) 

6 Test the hypothesis that the temperatures m Table 2-3-2 (and Figure 2-4-1) 
were drawn from a normal universe 

7. Test the hypothesis that the distribution which Table 4-3-2 represents is 
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normal (Note that the predicted frequencies are given in the answer to Problem 
1, Article 9, Chapter 6 ) 

9. SUMMARY 

When an investigator studies a sample, and deduces from it the properties 
which he believes the universe to have, he is forming a hypothesis which 
may or may not be true The likelihood of such a hypothesis is determined 
indirectly by computing the hkehhood that a sample drawn at random 
from such a universe will have the properties possessed by the actual 
sample. 

The simplest such hypothesis arises when the investigator measures th<? 
properties of two samples, obtamed under somewhat different conditions, 
and raises the question of whether or not there is a significant difference 
between the two, that is, of whether they could have come from the same 
universe or whether they probably came from two different universes To 
answer this important question we compute the probability that, if tVo 
random samples aie drawn from such a hypothetical single universe, the 
difference will be as large as or larger than the observed difference The 
procedural details of this computation are summarized near the end of 
Article 2 

If it is found that the difference is not proved to be significant, but that 
there is some reason to believe that a significant difference may exist, then 
it may be desirable to plan a further experiment to answer the question 
decisively by increasing the size of the samples The required sample size 
can be computed by the procedures described in Article 4 

In addition to forming hypotheses about the arithmetic mean of the 
umverse, it is sometimes necessary to make a hypothesis about the dis- 
persion of the universe Such hypotheses can be tested by using the equa- 
tion for the standard deviation of a standard deviation* 

The use of this equation is illustrated m Article 5 

Another kind of hypothesis about a umverse is that concerning the 
size of a given frequency or a set of frequencies To test such hypotheses 
we use the equation for the standard deviation of a frequency 

(j f = 

If the frequency is expressed as a percentage (P) of the variates, then its 
standard deviation is 

<j P = 

The use of these equations is discussed m Articles 6 and 7. 
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If the hypothesis about the universe consists of an assertion about the 
relative abundance of the variates m several classes, then we must use a 
somewhat more elaborate test, called the chi-square test To perform this 
test, we first compute the frequencies to be expected m the vanous classes 
if the hypothesis is correct and then compare these predicted frequencies 
with the actual observed frequencies The details of the test are summar- 
ized in Article 8 



CHAPTER 

. 12 . 

MULTIPLE AND PARTIAL CORRELATION 


1. INTRODUCTION 

In Chapter 9 we studied the methods by which we can predict the value 
which any variable will have if we know the corresponding value of any 
related variable We showed how the range of uncertainty of sucli a 
prediction could be estimated, and, finally, we set up a measure of the 
degree of relatedness of the two variables This theory was adequate for 
any problem involving only two variables, but if we have instead three 
variables, all related to each other, several important questions arise 

(1) If we wish to predict a variable z and know the values of two 
related variables x and y , then the methods of Chapter 9 enable us to 
predict 2 from either x or y } but not both In either case we may waste 
some potentially useful information How can we make a prediction 
based upon x and y simultaneously and thus combine the usefulness of 
the two variables? 

(2) How can we measure the total effectiveness of such a j omt prediction; 
that is, how can we measure the degree of relatedness between z on the 
one hand and x and y together on the other? 

(3) If x affects y, which m turn affects z , then there will be a statistical 
relationship between x and z If, m addition, x affects z directly, this 
relationship will be still stronger It would be helpful m untangling the 
cause-effect relationships involved if we could distinguish between these 
two situations Is there any way to separate them mathematically? 

The theory of multiple and partial correlation will cast some light upon 
these and related questions 

2. PREDICTION FROM TWO VARIABLES 

The problem of predicting one variable from known values of two related 
variables can best be discussed by following a specific example. In a study 
of the factors affecting the grades of 450 students at Syracuse University, 
the intelligence of each student was measured by standard tests, and the 
number of hours per week which each student studied was recorded. If 
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we let I stand for intelligence, S for hours of study per week and G for 
the cumulative grade of any student, measured in honor points, then the 
results of the study were as follows * 

1 = 100 6 (Tj = 15.8 r IG = +0 60 

S = 24 = 6 r so = +0 32 

t? = 18 5 <7 G = 11 2 r IS = -0 35 

where the bars above the letters indicate arithmetic means, and where the 
terms ? /£? , , and i IS stand for the ordinary correlation coefficients 

between these vanables in pairs 

Now let us set ourselves the problem of making the best possible pre- 
diction of the grade which any particular student might be expected to 
make, m the light of what we know about his intelligence and his study 
habits In particular, if a student named John Schreiber has an intelli- 
gence measure of 121 and tells us that he usually studies about twenty- 
nine hours per week, what grade should be expect to make? 

Before approaching this problem directly, let us see what can be done 
with it by the methods of Chaptei 9 If we use equation 9-7-1 to predict 
his grade from his intelligence alone, we have 

(?„ = (? + — r I0 (I - I) 

<?I 

or, for our present problem, 

G, = 18 5 + 0.60(121 - 100 6) = 27 2 

The standard error of estimate of this prediction is given by 9-9-1: 

So = aoV’l - rf G = 11.2 Vl - 0 6 2 = 9 0 

This tells us that the best predicted value for Mr Schreiber’s grade 
average is 27 2 or 8 7 points above the average grade But the standard 
error of estimate of this prediction is 8 96, and we see that the prediction 
is not very reliable The probability, for example, that Mr Schreiber will 
have a grade below the average, m spite of the prediction that his grade 
will be far above the average, is 0 166, as we find by the methods of Article 
9 of Chapter 9 

If we follow the alternative procedure of predicting his grade average 
solely from his hours of study, we have the following result 

G v = G + — r GS (S — S) = 21 5 

<7 s 

and S 0 = cr G Vl ~ = 10 6 

*Based upon data from “Predicting Academic Success” by Mark A May, Journal 
of Educational Psychology, 1923, Volume 14, pages 7 and 429, by permission of the 
publishers 
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Here we find that the predicted grade is only three units above the average, 
and the standard eiror of estimate is even largei than before! Clearly 
what we need is a method of making a single prediction which utilizes all 
the available information. This is the first objective of the theory of 
multiple con elation. 

The mathematical treatment of prediction from two vanables follows 
the same principles as the treatment of prediction from a single variable, 
but the details are much lengthier In view of this s imil arity, you should 
review the corresponding treatment of prediction from a single variable, 
using the following outline as a guide. 

(1) In Chapter 8, Article 6, we showed that an important property 
which the prediction equation must possess is that it must mi nimi ze the 
sum of the squares of the errors of prediction 

(2) In Chapter 8, Article 8, we showed that for a prediction equation 

of the form y v = mx + b, the values which m and b must have m order for 
the least squares criterion to be fulfilled are * 

m = o and b = y — mx 

° r X 

(3) In Chapter 9, Article 7, we showed that the resulting prediction 
equation could be simplified by introducing r, giving us finally the very 
simple form shown m equation 9-7-1, 


Vv = y + — r(x — x) 

<T X 

or, if we transpose the y and divide by a v , this takes the symmetrical form 


Vv~~ V 


O’ v 



( 12 - 2 - 1 ) 


Now to extend this procedure to the prediction of a third variable, z , 
from known values of x and y ) we begin by writing an equation similar to 
12-2-1, but containing one additional variable. 


*^Zl = A xnA + B y_Hl (12-2-2) 

cr z <r x (j v 

where A and B are as yet unknown multipliers which we must choose in 
such a way that the criterion of least squares is satisfied. We must, in 
other words, choose A and B m such a way as to make the sum of the 
squares of the errors of prediction, that is, S(g — z p ) 2 , as small as possible. 
If we insert the value given by 12-2-2 for z p , we have 

2(2 - 2 „) 2 = S [z - z - *.(a + B (12-2-3) 
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or, if we divide both sides by N m order to secure the advantages of work- 
ing with arithmetic means, and remove <r 2 as a factor, 



The light-hand side consists of three terms If we square it out, leaving 
each of these three terms intact, we will obtain thiee squares and three 
cross products The first of the squaie teims will be (z — z) 2 /a 2 z , which 
is equal to cr 2 z /a 2 , or one The first of the cross terms is — 2 A(x — x)(z — z)/ 
a x a z , which is equal to —2 Ar xz , as we can see from equation 9-6-2 If 
vve simplify all the six terms m this -way, we will have 

(z - z P ) 2 = al(l - 2Ar xg + A 2 - 2Br yz + 2 ABr xy + B 2 ) (12-2-4) 

Followmg the principles used m Article 8, Chapter 8, we group these 
terms as follows 

(z — z v ) 2 = <r z [A 2 — 2 A(r xg — Br xv ) + terms not contaming A] 

= <x 2 z [A — (: r xz — Br xy )] 2 + terms not containing A 

The terms a\[A — ( r xz — Br xy )] 2 must always be positive, since it is squared, 
and the smallest value which it can possibly take on is zeio To give it 
this smallest value, we must give A the value 

A = r zz — Br xv 

In the same way, we find that the value which we must give B to make 
the mean m 12-2-4 a minimum is 


B = r yz - Ar xy 

These two equations are not yet useful for finding A and B because the 
right-hand sides contain the quantities which we are trying to find If 
we combine the two equations and solve for A and B , we obtain 


A = 


T T 

1 XV 1 V. 


1 - rz 


B = 


r yz - r yx r xz 



\ 

f 


(12-2-5) 


These two equations, together with 12-2-2, enable us to make a prediction 
of the most likely value of any vanable from two other variables related 
to it For the problem of predicting grades, these become 


A T IO TlS r SG 

A- — 2 

1 T IS 


0 81 


B = ^ r -¥^ = 0 60 

1 - l~ta 
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G v - G 

<?Q 


0 81 + 0 60 

<Tl 


S - S 

0's 


( 12 - 2 - 6 ) 


which gives us, for the case of Mr Schreiber, a predicted grade as follows 


- 18 5 
11 2 


0 81 


121 - 100 6 
15 8 


+ 0 60 


29 - 24 
6 


or, upon solving, G„ = 35 8 

To find the confidence limits of this prediction, we must derive an 
equation for the standard error of estimate This quantity, it will be rer 
called, is defined as the square root of the mean of the squares of the 
errors of prediction 


& = V( z - o* 


We will use the same definition here, but will now use the notation S z xv m } as 
a reminder that z is now being predicted from both x and y. We can obtain 
an equation for S z xv by taking the square loot of both sides of equation 
12-2-4, and then substituting for A and B the values given by equations 
12-2-5 The result, after simplifying, is 

S. xv = <r.Jl - 'Si (12-2-7) 

\ 1 T X y 

which is the required equation for the standard error of estimate for pre- 
diction from two variables If we apply this to our illustrative example 
of the prediction of grades, we have 

a fi r IG ~ ^ r IG r IS^SG ~h r SG r> Q 

£>(? IS — O’G'y -L -J^ _ r 2^ — O O 

Thus we see that the standard error of estimate of grades is much lower when 
we predict it from both intelligence and study than when we predict it from 
either factor alone If we express our results m terms of probable error, 
we see that Mr Schreiber’s predicted grade is 35 8 ±4 2, whereas m the 
prediction from intelligence alone it was 27 2 ± 6 1, and from study alone 
it was 21 5 ± 7 1 If, instead, we express our results m terms of con- 
fidence limits at the 0 10 level, then the result tells us that we can expect 
Mr. Schreiber ’s grade to be between 25 8 and 47 0 


PROBLEMS 

1. Complete the derivation of equations 12-2-5, supplying and explaining all 
the missing steps 

2 (More difficult) Complete the derivation of equation 12-2-7 

3 Using the data m Article 2, predict the grade which you would expect from a 
student with an intelligence measure of 87 who studies 14 hours per week 
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4 What is the probability that the student m Problem 3 will actually make a 
grade of 20 or above? 

5 Using the results of Problems 1 and 3, Article 11, Chapter 9, write the equa- 
tion for predicting mathematics grades from both the old entrance test and the 
experimental test used together 

6 Compute the standard error of estimate of this prediction, and compare it 
with the standard error obtained when the old test is used alone Is the improve- 
ment enough to justify the retention of the experimental test along with the old 
one? 

7 Compute the standard error of estimate which you would obtam if you pre- 
dicted language grades from both the old entrance test and the new experimental 
test How does this compare with the standard error obtained by the use of the 
old test alone? Does the improvement justify the retention of the experimental 
test along with the old one? 

3. THE MULTIPLE CORRELATION COEFFICIENT 

Before studying the definition of the multiple correlation coefficient, 
you should review briefly the derivation of the simple correlation coefficient 
described m Chapter 9, Articles 2, 3, and 4 An outline of the essential 
steps is as follows 

(1) Using y to denote the observed value and y v to denote the predicted 
value, y v — y becomes the predicted deviation of any variate from the 
mean, y — y v becomes the error of the prediction, and y — y becomes the 
total deviation from the mean We pointed out that if these three quan- 
tities are averaged m any way, the average of y v — y divided by the average 
of y — y is a measure of the degree to which the prediction has succeeded, 
and the aveiage of y — y v divided by the average of y — y is a measure 
of the degree to which the piediction has failed 

(2) We then pointed out that these ratios could be interpreted as a 
“percentage of success” and a “percentage of failure” only if the averages 
are performed in such a way that the two ratios total 100 per cent 

(3) We showed that this condition is fulfilled if we square each devia- 
tion before averaging. 

(4) Taking advantag e of this fact, we then defined the coefficient of 
determin ation (D) as (y p — y ) 2 (the explained variance) divided by 
(y ~ y) 2 (the tot al variance). We also defined the coeffic ient of al ienation 
(A) as (y — y p ) 2 (the unexplained variance) divided by (y — y) 2 These 
can be interpreted loosely as the “percentage of relatedness” and the 
“percentage of independence ” 

(5) For convenience m computation, we defined the coefficient of 
correlation (r) as the square root of D. 

If we are to apply this same procedure to the case in which a variable 
z is predicted from two other variables, x and y, we must show that the 
sum of these two ratios is again equal to 1, m other words, that (z p — z) 2 
plus (z — z P ) 2 is equal to (z — z) 2 . We will again call these three quantities 
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the explained variance, the unexplained variance, and the total variance 
The first one is 


Expl Var = (z p - tf = A A ^ + B 

\ O’ x (Ty J 


where we have substituted for z P — z the value given by 12-2-2 If we 
now square out the right-hand side and separate the result into three 

j^( x — ~\ 2 4 2 2 

separate means, the first term will be — - — 5 — — or 2 * or simply A 2 


The cross product will be ^ ^ or 2 ABr xu . The tlnrd 

term, m the same way, will become simply B 2 . Inserting these values, vfe 
have 


Expl Var = o 2 (A 2 + 2ABr xy + B 2 ) 


If we now substitute m this expression the values of A and B given by 
equation 12-2-5, we obtain, after simplifying, # 


Expl Var 



— 2 T T T 4- r 2 

*•“ vz‘ xy ' xz I • X; 


(12-3-1) 


The unexplained variance is ( z — z p ) 2 , which is simply the square of the 
standard error of estimate From 12-2-7 we have 


Unexpl Var 


(z - z P ) 2 = si 



- 2r p ,r IP r„ + rj 

1 7'xu 


(12-3-2) 


By comparing equations 12-3-1 and 12-3-2 we observe that the important 
additive property of the variances holds also for multiple correlation 
The explained variance plus the unexplained variance equals the total variance 
The reasoning used m defining the simple coefficient of determination 
therefore applies to the case of multiple correlation as well, and we there- 
fore define the coefficient of multiple determination as the ratio of the 
explained variance to the total variance; 

n - & ~ g ) 2 
1 xv 

and the coefficient of multiple correlation as the square root of D z . xv 



Thus the coefficient of multiple correlation is the square root of the ratio 
of that part of the variance in z which is predictable by joint use of x and 
y to the total variance in z To obtain an efficient equation for computing 
it we have only to divide the explained variance (as given by 12-3-1) by 
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the total variance (<r z ) 2 and take the square root. The result is 




2r* 


+ rl 


1 -K 


(12-3-3) 


For the example of the relationship between grades, mtelligence, and 
hours of study given in the preceding article, this equation gives us 


Tg si — 0 82 

To give this a concrete meaning, we must square it to obtain the coefficient 
of determination 

Dg is ~ 0 68 


Thus we see that 68 per cent of the variance m grades is related to the 
students’ measurable intelligence and hours of study, leaving 32 per cent 
which is related to other variables 


PROBLEMS 

1 Complete the derivation of equation 12-3-3, supplying and explaining all the 
missmg steps 

2 Compute the coefficient of multiple correlation between mathematics grades 
and the two test scores, using the results which you obtained m Problems 1, 3, 
and 4, Article 11, Chapter 9 

3 What fraction of the total variance m math grades is predictable (a) from 
the old test alone? (b) From the new test alone? (c) From both tests used jointly? 
(d) What fraction is independent of both tests? 

4 and 5 Repeat problems 2 and 3, applied now to the prediction of language 
grades mstead of math grades 


4. PARTIAL CORRELATION 

In the introduction to this chapter we raised three questions which were 
left unanswered or only partly answered by the simple theory of correla- 
tion. We have answered the first two of these questions by the theory of 
multiple correlation, and are now ready to devote our attention to the 
third. If a correlation exists between two variables, how can we tell 
whether it is a direct relationship or a relationship operating through a 
third intermediary variable? The significance of this question will become 
clear in the light of a specific example 

The United States was divided up into eighteen regions for statistical 
purposes, and the following three quantities were tabulated for each region 
for the year 1930 the number of suicides per 100,000 population, the 
mean age of the inhabitants, and an mdex measuring the frequency of 
business failures The correlation coefficient between suicide rate and 
busmess failure was 0 40, that between suicide rate and age was 0 77, 
and that between age and business failure was 0 46* The first of these 

*These data are reprinted by permission of Prentice-Hall from Applied General 
Statistics by Croxton and Cowden, copyright 1939 by Prentice-Hall, Inc 
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three figures suggests that failure in business might be an important direct 
factor m the motivation of suicide However, an alternative possibility 
is that suicide is frequently motivated by factors connected with old age 
(such as ill health) and that business failure is caused by factors connected 
with old age (such as a decline of ability or of initiative) If this is true, it 
might be possible that the entire apparent relationship between business 
failure and suicide rate is due to the connection of both with the third 
variable and that business failure m itself is not an important motive for 
suicide. It is to give us some insight into problems like this that the theory 
of partial correlation has been developed 
Let us first state the problem m general terms Two variables, x and 
are related to each other, directly or indirectly, with a known simple 
correlation coefficient r xv Both x and y are connected, to some extent, 
with a third variable, 2, and we know the coefficients of correlation r xt 
and r vz Can we, by means of mathematics, separate the direct causal 
relationship between x and y from the indirect causal relationship which 
operates through 2? 

To attack this problem, let us recall that the theory of simple correla- 
tion makes it possible to separate each value of x into a part (x p ) which is 
completely predictable from 2, and a part (x r ) which is completely in- 
dependent of 2, as described m Article 2 , Chapter 9 . In the same way we 
can separate y into y v and y r , where the latter is completely independent 
of 2 If we should now find the ordinary coefficient of correlation between 
the part of x which is independent of z and the part of y which is independent 
of 2, we will have a measure of that part of the relatedness between x and 
y which does not operate through 2 or through anything related to 2 Such 
a measure is called the partial correlation coefficient between x and y It 
will be denoted by the symbol r xv z , where the 2 m the subscript is a re- 
minder that the effect of 2 has been lemoved from both variables It is 
obvious that r xy z can be interpreted as the ordinary correlation coefficient 
which we would obtam between x and y in any subsample m which the 
values of 2 were all of the same size 

It would be possible, m any given problem, to carry out the operation 
described in the preceding paragraph and so derive the partial coefficient 
of correlation, but shorter methods can readily be derived To avoid 
cumbersome subscripts, let us use the symbol X for the part of x which is 
independent of 2, and Y for the part of y which is independent of 2. In 
other words, let us let X = x — x P , where x p is the best value of x which 
can be predicted from a knowledge of 2 This best predicted value can be 
found (with a suitable change of variables) from equation 9 - 7 - 2 . 

x P = x + — r xz (z — z) 

<?z 

X = (x - x) - — r xz (z - 2) 

‘ < T z 


so that 


( 12 - 4 - 1 ) 
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Similarly, 

Y = (y - y) — — rjz - z) (12-4-2) 


The partial correlation coefficient (r xv z ) which we wish to compute is 
simply the ordinary correlation coefficient between X and F, which is, 
from equation 9-6-1, 


= r XY = 


X Y - XY 

<TX&Y 


(12-4-3) 


We will evaluate the ingredients m this equation separately. 

(1) If we take the arithmetic mean of both sides of equation 12-4-1, 
we have 


X = (x — x) — — r X2 (z — z) = 0 

G X 

Similarly, F = 0 

(2) The standard deviation of X can be obtained by applying equation 
4-4-1 Remembering that X = 0, this becomes <r x = \/ X 5 , or, inserting 
12-4-1, squaring, and simplifying, 

i n = sj(x — .r) 2 - 2(x - %){z - z) r *l~) + rL^) ( z ~ 

We recognize (x — x) 2 and (z — z) 2 as <r* and or* respectively, and from 
9-6-2 we recognize (x — x)(z — z) as r xz a x <r t . Making these substitutions 
and simplifying, we have 

<r x = a x 'Vl—rl x 

Similarly, 

= cr v V 1 ” 

(3) We obtam XF by multiplying togethei the values given by 12-4-1 
and 12-4-2, forming the mean, and simplifying as before. 

XY — <T x a y {r xv — r xz r yz ) 

Substituting these results in 12-4-3, we have 


r xv , = 


r xe r it 


V(l - 0(1 - o 


(12-4-4) 


We thus see that r 21/ 2 can be obtained directly from the separate correla- 
tion coefficients, without the labor of computing the portions of each 
individual x and y which are independent of z To apply this result to 
the problem of the relationship between business failure and suicide rate, 
let us denote the suicide rate by S, the business failure index by F, and 
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the average age by A Then, applying 12-4-4, 


^SF A 


r SF ?SA r FA 

V (1 - rLXl - rid 


= 0 40 — (0 77) (0 46) 

V(1 — 0 77")(1 - 0.46 2 ) 


5? = 0.081 


Thus we see that when the effect of age has been removed from both factors, 
there remains only a negligibly small relationship between busmess failure 
and suicide rate Stated m another way, we see that if it were possible 
to obtain a sufficiently large subsample of communities all with the same 
average age, then the ordinary correlation coefficient between business 
failure and smcide rate within this subsample would be only 0 081 We 
conclude that business failure is of extremely little importance in the 
motivation of suicide 

The partial correlation coefficient in the above example turned out to 
be smaller than the original correlation coefficient between the two variates. 
To see that this is not always the case, let us consider the data given in the 
preceding articles for the intelligence (7) of 450 students at Syracuse 
University, their grade average (G) m terms of honor points, and their 
hours of study ( S ) per week 


r GS = +0.32 
r GI = +0.60 
= -0 35 

If a student examines these results with the purpose of estimating the 
grade improvement which he could expect if he were to study more hours 
per week, he might reason as follows “The first coefficient, 0 32, indicates 
a moderate relationship between study and grades, and one explanation 
for this relationship is that more study causes higher grades. On the 
other hand, it is possible that the more intelligent students are motivated 
to study more, and it is certain that the more intelligent students get 
higher grades because of their greater intelligence Thus the apparent 
relationship between grades and study may reflect merely the degree to 
which both are controlled by intelligence, in which case, smce I cannot 
alter my intelligence, more study would be useless For my purposes, it 
would be better to know the correlation coefficient between study and 
grades for the subgroup of students who have nearly the same intelligence 
that I have 77 This latter quantity can be obtained from 12-4-4: 

_ 0.32 ~ (0 60)(-0 35) a vi 

ras 1 V(1 - 0.60 2 )(1 - 0 35 2 ) 

Thus we see that the relationship between grades and study for students 
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of a given intelligence is much stronger than it is for the student body as a 
whole 

To cast further light upon this situation, let us compute the partial 
correlation coefficient between intelligence and grades, with the effect of 
study removed, 

Tig s — 0 80 

which is again larger than the simple correlation coefficient (0 60) between 
these variables In other woids, intelligence has a strong effect upon grades, 
for students who study a fixed number of hours per week, but m the general 
student body this effect is partially obscured because we mix together 
students of various study habits 

The explanation of these interrelationships is obviously that a student 
with a high intelligence can use it m either of two ways first, he can study 
an average amount and make high grades, second, he can content himself 
with average grades, m which case he needs to study only a very little If 
many of the intelligent students make the latter choice and work only a 
few hours a week to secure mediocre grades, then the intelligent students 
as a whole will have little higher grades than the others In this case the 
causal relationship between intelligence and grades will not appear very 
strongly m an overall tabulation, but will become apparent when we com- 
pare students who study the same number of hours per week Since this 
is exactly what we observe, it appears that many of the intelligent students 
make the second of the two choices 

To verify the sad conclusion that intelligent students frequently become 
very lazy, let us compute the relationship between intelligence and hours 
of study for any subgroup of students all receiving the same grades 

v is g ~ 6 72 

Thus we would expect that for all the students who have a “C” average 
(for example) there is a very high negative correlation between intelligence 
and hours of study The group therefore includes many gifted students 
who work very little, and many intrinsically poor students who work very 
hard to compensate for their shortcomings 

One word of caution may be appropriate before we leave the topic of 
partial correlation. When we find that r AB c is veiy much smaller than 
r AB , it is customary to accept this as evidence that the causal relationship 
between A and B operates through C, rather than directly between A and 
B, since the relationship vanishes upon removing the effect of C upon 
both A and B But a review of the method will show that we not only 
remove the effect of C upon A and B, but also remove the effect of any 
variable which is strongly related to C, and we must therefore make our 
conclusion a little weaker If r AB c is very much smaller than r AB , then it 
is likely that the original causal relationship opeiated not directly between 
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A and 5, but indirectly through C , or through some variable or variables 
which are related to C 

PROBLEMS 

1. In a study of college dating, it was shown that there was a negative correla- 
tion between the number of dates per week and the grades obtained m school 
This was interpreted to mean that “too much dating is bad for the students’ 
minds” and diminishes their ability to concentrate on then* work An alternative 
explanation was simply that dating took time which might otherwise be spent on 
study The correlation coefficients are 

r DG = —0.42 

m 

Yds — 0.55 

Tsg = 0.77 

where D is the number of dates per week, S is the hours of study per week, and 
G is the student’s grade Compute r DG s and comment upon the tenability ofcthe 
second explanation 

2 Given the information that r EI is 0 72, r AI is 0 64, and r AE is 0 51, where I 
is the income of each individual, E is the number of years of education which he 
has received, and A is his native ability as measured by intelligence tests Does 
this indicate that more education is likely to increase the income of any individual, 
or does it simply reflect the fact that people with higher ability have more earning 
power on the one hand and are likely to remain in school longer on the other hand? 

3 Using the results of the problems m Article 11, Chapter 9, compute r LE x , and 
discuss the possible reasons for its difference from r LE 

4. Compute r ME x How does this differ from the situation in Problem 3? 

5. SUMMARY 

There are two central topics m this chapter The first is that of multiple 
correlation and regression, which is concerned with the prediction of one 
variable from several other variables used simultaneously The second, 
which is related mathematically but is quite different in objective, is that 
of partial correlation, which is concerned with the analysis of the channels 
through which various causes may operate on a given variable. The 
operational procedures and interpretations of these two topics will be 
summarized separately 

1. Multiple Correlation and Regression 

1 Prediction Equations If a variate z is to be estimated or pre- 
dicted, and two related variates x and y are known, then a much better 
prediction can generally be secured by using x and y simultaneously in a 
single equation than can be secured by using either of them alone To 
make such a prediction, the successive steps are as follows 

(a) Compute x, jj, z, a x } a v , and <r s . This step can conveniently be 
combined with step b. 
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(b) Compute the ordinary coefficients of correlation between x } y ) and 
z. Call these r xy , i X2 , and r vz 

(c) Compute A and B from equations 12-2-5. 

A ^XZ T xy T yz 

Jl. — i 2 

1 — r xv 


B = 


Tyz xy 

1 -r 2 xy 


(d) Insert these m equation 12-2-2* 


n 




+ B 


y - y 


<Jy 


This is the required regression equation Its use is illustrated in Article 2 
2 Standard Error of Estimate When a prediction has been made 
by means of the above equations, its standard error can be computed by 
equation 12-2-7 


S 


z XV 



2r xz r x , 


+ K 


This has the same meaning as the standard error of estimate m simple 
correlation theory It is the square root of the mean of the squares of the 
errors of estimation It can be used, with the normal curve tables, to 
compute the probability that the true value will he between any given 
hmits 

3 Coefficient of Multiple Correlation This quantity is com- 
puted by means of equation 12-3-3 




z T xy T yz -j- T y< 

1 _ r 2 


To mterpret a given value of r z xy , it is desirable first to square it to 
obtam the coefficient of multiple determination, D z xy , which is then the 
fraction of the total variance which is explained or predicted by the multi- 
ple regression equation It can be thought of as that fraction of the total 
causation of z which is related to x and y The computation and interpre- 
tation of this quantity is illustrated m Article 3 


II. Partial Correlation 


1 Coefficient of Partial Correlation. This is computed from 
equation 12-4-4* 


r __ • xy 1 xz l vz 

V(i - 0(1 - o 

This quantity is the ordmary coefficient of correlation between the part 
of x which remains when the effect of z is removed, and the part of y which 
remains when the effect of z is removed 
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2 Interpretation If we assume that variations in x are causing 
variations in y then r xv z measures the part of the causation which acts 
directly rather than through the intermediate quantity z If the coefficient 
of partial correlation is very nearly zero, then we know that any causative 
effect of x on y does not act directly but instead acts entirely through the 
intermediate variable z or through variables related to z If, on the other 
hand, the partial coefficient of correlation is of about the same size as the 
ordinary coefficient, then we know that the causation does not act through 
z or through variables related to z, but instead acts directly upon y, or 
possibly through other variables not under consideration which are in- 
dependent of z As m the case of simple and multiple correlation theory* 
it should be borne in mind that the mathematical analysis contains no 
information about the nature of the causal relationship, but only about the 
strength of the relationship Some applications of the paitial correlation 
coefficient are discussed m Article 4. 



CHAPTER 


. 13 . 

STATISTICS AND COMMON SENSE 


1. INTRODUCTION 

It is customary in many circles to be suspicious of conclusions drawn 
from mathematical analysis of statistical data This is to some extent 
attributable to the fact that the methods of statistical analysis have fie- 
quently been misused Some of this misuse has undoubtedly ansen fiom 
intentional dishonesty, but by far the larger shaie has been due to careless- 
ness or meptness on the part of the investigator In this chapter we shall 
point out a few of the pitfalls against which you should guard, both m 
performing statistical analyses of your own and m interpreting the results 
of others These pitfalls have been grouped into somewhat similar classes 
for convenience, but the classification is based upon differences of emphasis 
rather than differences of kind, and many of the illustrative cases could 
have been placed in any of several classifications with equal validity. 

2. INADEQUATE INFORMATION ABOUT DATA 

This first classification of statistical fallacies coveis a variety of errors 
which arise because the investigator allows himself to lose touch with the 
exact meaning of the original data upon which he is basing his study It 
should always be remembered that the data m statistical tabulations con- 
stitutes a much abbreviated description of events which may have been 
very complex, and if the investigator is to use the data intelligently, he 
must know exactly how these complex events were reduced to numbers 
m tables If, for example, you should find that an annual report of traffic 
accidents shows a 5 per cent increase m accidents due to intoxication as 
compared to last year, you might perhaps regard this as evidence of a real 
increase in drunken driving But, totally aside from the question of 
whether the difference is large enough to be statistically significant, there 
is a strong question of whether the numbers m the table mean exactly 
what they appear to mean If Mr John Oatman, while driving his 1940 
Plymouth home after having had two cocktails, speeds up to 35 miles 
per hour m a 25 mile zone and bumps the rear end of another car, what is the 
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cause of the accident? Poor visibility? It was raining but theie was 
a no actual fog ' Physical handicaps of driver? Mr Oatman was not 
wearing his glasses and has a driver's license which permits him to drive 
only with glasses Defects m vehicle? The brakes were passed at the 
last inspection, maybe they’ve gotten a little out of adjustment since 
then At some point a complex judgment has to be made to convert Mr 
Oatman into a statistic Perhaps the increase m the percentage of accidents 
due to drunkenness (as tabulated) represents a true increase m drunken 
driving Perhaps it merely represents the fact that as a result of a few 
serious accidents involving drunkenness the enforcement officers have 
become more severe with drivers who have been drinking, so that more 
borderline cases get into this classification The only persons who are 
qualified to interpret the results are those who are thoroughly familiar 
with the details of the machinery for collecting the original data 

The same problem of subjective effects on the part of the original collec- 
tor of data is sometimes conspicuous m tabulations of vital statistics 
Doctors have frequently commented upon the fact that in tabulations 
of causes of death there is a tendency for people to die of “fashionable” 
diseases To see why this is so, we have only to picture a typical case of 
an elderly patient who is clearly approaching the end of his life span, and 
who is suffering fiom heart disease, high blood pressure, and kidney 
disease, any of which might result m death The patient's condition 
deteriorates steadily and terminates m death, and the attending physician 
must then assign the cause of death If we suppose that the doctor has 
recently read some important new articles about treatment of kidney 
disease, then he is likely to be particularly aware of this illness m the 
patient and perhaps is likely to emphasize it m choosing the cause of 
death It would obviously be very unwise to accept at face value a small 
apparent increase m deaths due to such an illness 

The above examples are both concerned with a failure on the part of 
the investigator to be aware of possible subjective effects on the part of the 
man who first makes an entry m a table to describe an event Another 
danger is that the investigator might be insufficiently informed about the 
exact criteria of classification of the original data Given the figures on 
the number of farms m a given state, the number of hired men on these 
farms, and the total volume of farm products, all for two successive years, 
can you find whether the productivity per farm is increasing or decreasing, 
and whether the farmeis are hiring more or fewer men than they did 
formerly? Before you could safely draw these conclusions, you would 
have to know exactly what the investigators in each of the two years 
meant by “farm,” and “farmer,” and “farm employee,” and “farm 
product ” If a retired policeman raises an acre of vegetables primarily 
for his own use, you would probably agree that he should not be counted 
as a farmer m a statistical tabulation, if he raises twenty acres of crops 
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and sells 90 per cent of them, you would perhaps include him If his 
brother helps h im on the farm during a two-week vacation, he would 
hardly be described as a farm employee, but what if he woiks two months, 
or five months? Somewhere m this range an arbitrary line must be drawn 
If the data have been properly collected, this line has been drawn exactly 
by means of a carefully worded quantitative definition, which admits of 
no uncertainty when it is applied to a specific case, and it is the duty of 
anyone who subsequently uses the data to know exactly what this defini- 
tion is In 1905, for instance, the United States Census of Agriculture 
defined a “farm” m such a way as to exclude any tract of land of less than 
3- acres, unless agricultural products to the value of $250 or more were 
produced on it in the previous year In 1910 the definition was changed 
somewhat Clearly a direct comparison between these two years would 
be misleading unless the investigator was aware of the change of classi- 
fication 

3. NON-REPRESENTATIVE SAMPLE 

This is a statistical pitfall which probably outweighs m frequency of 
occurrence all the others combined, it is easily the most important pitfall 
of statistical analysis We have seen that a statistician must frequently 
limit his study to a sample chosen at random, either for reasons of economy 
or because the sample is all that is available foi study, and that he must 
draw his conclusions about the universe from his study of the sample In 
any such study, a basic assumption is that the sample is random, that is, 
that it has been drawn in such a way that no particular kind of variate has 
any higher likelihood of being included m the sample than any other At 
first glance, the selection of a random sample appears to be a simple matter, 
but in practice it is filled with unexpected difficulties A few examples 
will indicate the nature of some of the difficulties to be encountered. 

Public opinion polls m which the information is collected by phone or 
by mail are frequently unreliable Even if the pollees are chosen at random 
from a telephone directory, they still repiesent only a random sample of 
that part of the population which has telephone service , which is quite different 
m economic level and m other ways from the part of the population which 
does not have telephone service Mailing lists are sometimes chosen at 
random from the files of public utilities, and here the same kind of selec- 
tion arises. The people who do not have electricity m their homes, for 
example, might be excluded from such a poll, and these people are likely 
to be from a lower than average economic level and therefore are likely 
to have a somewhat different political orientation from that of other 
groups One of the classic examples of a non-representative sample of this 
sort was the 1936 public opinion poll arried out by the Literary Digest 
More than ten million questionnaires were mailed out, and almost two 
and a half million were returned On the basis of the opinions expressed, 
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it was predicted that Landon would win 370 electoral votes to 161 foi 
Roosevelt, while m the actual election Landon won 3 and Roosevelt 523 J 
Here there was a second kind of selection operating as well, inasmuch as 
only about one-fourth of the questionnaires were returned. The one- 
fourth who were sufficiently motivated to return the questionnaires were 
presumably the ones who felt most strongly about political matters, and 
it is possible that the ones who felt most strongly about political matters 
were pnmai lly those who proposed to upset the status quo, that is, the 
Landon supporters f The case is particularly interesting because a careful 
study of the results of the poll itself would have revealed evidence of non- 
random selection Some of the pollees were asked “How did you vote ia. 
1932?” and the results of this question indicated that about 50 per cent 
of the people answering the questionnaire voted for the Republican candi- 
date m 1932 But the overall vote m 1932 was almost 60 per cent Demo- 
cratic * The sample was therefore obviously drawn somewhat more heavily 
from the Republican voters than from the Democratic A similar situation 
occurred m the opinion polls preceding the Truman-Dewey election 
Examples of this kind of non-random sampling or bias can be enumerated 
indefinitely In one local tabulation the average size of the families in the 
legion turned out to be surprisingly high Upon investigation it was found 
that* (1) the data was collected by door-to-door interviewers; (2) if no 
one was at home, an attempt was made to revisit the home later, but this 
had not been completely carried out and some of the homes were missed 
and not revisited Now it is obvious that m general the larger the family 
the greater is the probability that someone will be home when the inter- 
viewer calls, and that the omission of families where no one was at home 
represents a non-random omission with regard to family size 

A somewhat different kind of non-random sampling arose in the follow- 
ing case An article appeared lecently by a physician who specialized in 
women’s illnesses He participated m a semi-chanty clime and also 
carried on a private practice, and he wrote that he had become increasingly 
aware of the fact that the incidence of a certain illness among his private 
patients was much higher than it was among the “lower classes” whom 
he attended in the clime Since the illness often resulted m sterility, he 
concluded that the “better classes” were being menaced by this extra 
margin of fertility m the “lower classes ” His results were strongly criti- 
cized on the grounds that the clinic patients represented a more or less 
random sampling of various illnesses, while his private patients were 
strongly selected by the fact that he was a well-known specialist in this 
particular illness and would naturally tend to be chosen by people suffering 
from it who could afford to pay for their medical care f 

Another sort of non-random sampling in medical statistics has been 
commented upon by Professor Wiener in his delightful book The Human 
Use of Human Bevngs “In connection with mental cases, but also with 
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many others, I wish to deplore the fast-and-loose way with which most 
doctors play with statistics A disease is first recognized m those cases m 
which it assumes an acute or even fulminating form Accordingly, the 
early statistics give a disease, whatever it may be, a high mortality rate, 
and a large list of complications Later on, similar physiological or mental 
changes are recognized m patients who are less ill and who would probably 
have recovered anyhow At least, many of them might have led a useful 
life for several years even without treatment When treated, these less 
serious cases respond far better than the cases already doomed ‘Ah-ha/ 
says the doctor ‘Look at my statistics My esteemed predecessors saved 
Only half their patients, and I saved nine-tenths ’ What a triumph for 
medicine ”* 

Another particularly troublesome kind of non-random sampling arises 
in cases where the original selection of individuals is random, but m which 
the process of carrying out the investigation alteis those individuals so 
th&t they are no longer representative This pioblem appears m many 
guises It is an easily demonstrated fact, for example, that interviewees 
sometimes tend to modify their answers m such a direction as to win the 
approval of the questioner ** A specific instance of a sample which is 
altered by the investigation is the following There is currently being 
conducted m one of our large cities an experiment to determine all the 
long-range effects of good or ill health m children, and a set of xepresenta- 
tive families have been asked to cooperate by permitting the investigators 
to make periodic detailed physical examinations of their children and to 
study a number of possibly health-related factors such as, for example, 
grades m school and social adjustment It is inevitable that the act of 
participation m the study, and the frequent health checkups, will increase 
the awareness of health problems m these families, so that their behavior 
m matters of health will no longer be representative of the population at 
large, no matter how successful the original random selection may have 
been 

The Fish and Game authorities m Ohio have been accustomed to esti- 
mate the fish population of the various lakes by netting large numbers of 
fish, marking them by clipping the end of one fin, returning them to the 
lake, and finally, after the marked fish have had time to mix thoroughly 
with the general population of the lake, netting samples of this mixed 
population From the records of the number of marked fish known to be 
in the lake, and the observed percentage of fish which are found to be 


*Reprmted from The Human Use of Human Beings by Prof Norbert Wiener, by 
permission of the Riverside Press Copyright by the Riverside Press 

**It is said that m rural areas investigators who ask “How many times a week on the 
average do you take a bath?” will receive answers centering around two 01 three, but 
if they ask “Which night of the week do you generally take your bath?” the answer is 
usually “Saturday” I 
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marked in the later samples, the total population of the lake can be found 
m an obvious way The method yields results of high accuracy for small 
lakes, but for larger lakes the results are more and more obviously in 
error, and the authorities are now searching for a better method Here 
the trouble is that the fish do not mix thoroughly, but tend to swim in 
schools, so that no sample can be trusted to be random with respect to 
the percentage of marked fish Also there is a possibility that the process 
of netting and marking the fish alters their future behavior (for example, 
they may learn to avoid the net), and so alters their probability of being 
netted again m future Here again we see that the investigator is unable 
to prevent his experiment from altering the properties of his sample 

4. CORRELATION AND CAUSATION 

In the chapters on correlation we emphasized the fact that a correla- 
tion coefficient measures the degree to which two variables are related, 
but that it does not provide any direct information about the nature of the 
causal relationship Failure to remember this leads to many a statistical 
absurdity A popular joke among statisticians concerns the high co- 
efficient of correlation which can be found between the average pay for 
school teachers and the total number of dollars per year spent for whiskey 
over the last century f This, it is pointed out, clearly indicates that it is 
folly to increase the pay of teachers, since it is obvious that any increase 
which they obtain will only be spent on whiskey! In this case the true 
explanation of the correlation is not difficult to find. During the interval 
covered, there was a steady decline in the purchasing power of money 
Two dollars once represented a day’s work, while ultimately it came to 
represent little more than an hour’s work This change produced a great 
increase m the monetary valuation of all commodities, including whiskey, 
teachers’ labor, and many other things 

More serious examples are not difficult to find. For example, a typical 
misleading correlation is that between the suicide rate in various com- 
munities and the church attendance m these communities Instead of 
indicating that religious people are more likely to commit suicide than 
non-religious people, this is again a case m which the two variables are 
not directly related causally, but are instead both related to a third variable, 
which m this case is the size of the community 

Another such example is the high correlation between the number of 
Jewish bakeries m various communities and the average wage of bakery 
employees m these commumties An inexperienced statistician might 
easily conclude that this indicated that wages in Jewish bakeries were 
higher than m comparable non-Jewish bakeries Upon mvestigation, 
however, this turns out to be not the case Instead, it can be shown that 
all bakeries in large cities have higher wage scales than those m small 
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cities or rural communities, and it can also be shown that the percentage 
of Jewish population m large cities is generally higher than m rural areas 
The Jewish bakeries and the non- Jewish bakeries m any given community 
are likely to have about the same wage scale These are obviously problems 
m which partial correlation theory would clarify the causal relations. 

5. INAPPROPRIATENESS OF DATA 

Many statistical discussions aie sound m logic but faulty because the 
data chosen to support the aigument aie not appropriate for the purpose 
A simple example of inappropriate data is the following Suppose that 
you are planning to travel from New York City to San Francisco by train 
or by air, and are curious about the relative safety of the two modes of 
travel If you found that m 1950 there were 111 commercial aviation 
fatalities, m a total of 48,000,000 man hours of flying, or an average of 2 3 
fatalities per million man hours of flying, while m railroad travel there 
were 3627 fatalities m 630,000,000 man horns of travel, or an average of 
5 8 per million man horns of travel, would this justify you m choosing the 
plane on the grounds of its greater safety? Certainly not, for several 
obvious leasons and several not so obvious In the first place, a given trip 
will require far more rail hours than air hours, and a better basis for com- 
parison would be that of fatalities per passenger mile In the second place, 
since accidents m flying frequently occur at takeoff or landing, a long trip 
has a lower fatality late per passenger mile than a short trip, and we 
should investigate the fatalities per passenger mile for trips as long as the 
one you contemplate 

But these are secondary considerations, the important consideration 
is Were the original data collected m such a way as to be applicable to 
the question under consideration? In this case a closer investigation will 
show that of the 3627 rail fatalities, 399 victims were railway employees, 
and these should obviously be excluded if we wish to evaluate the hazard 
to a passenger Furthermore, 1218 were what the railway tabulators 
describe as “trespassers/ 7 and these should be excluded for the same 
reason, unless you are contemplating a trip to San Francisco on the rods 
underneath a freight cart But this is not the end, 1698 of the fatalities 
involved people killed at grade crossings, and m short, only 184 of the 3627 
were ordinary paying passengers T When we have made a similar scrutiny 
of the data for air travel to be certain that the data are applicable to our 
problem, then we can draw a conclusion about relative safety * 

A special kind of inappropriateness of data arises when we choose an 

*These figures are reprinted, with permission, from the 1950 report of the National 
Safety Council For 1950, domestic scheduled air lines had 96 passenger deaths m 

8.363.000. 000 passenger miles, or 1 1 fatalities per 100,000,000 passenger miles, while 
the railroads had 184 passenger deaths m 31,800,000,000 passenger miles, or 0 58 per 

100.000. 000 passenger miles. 
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inappropriate average to sum up a complex situation Suppose that you 
were working with a list of all the factories m a given community and that 
the data included the mean weekly wage m each factory If you wished 
to know the mean weekly wage paid m factories in the community, would 
it be permissible to take a straight mean of these mean salaries? To do 
so would give equal weight to a large factory employing several thousand 
workers and a small factory employing half a dozen, and the result would 
be very misleading if, for example, the small factories paid lower wages 
A more representative average would be obtained by weighting the indi- 
vidual means m proportion to the number of men employed at each factory. 

9 

6 . FALLACY OF LARGE NUMBERS 

This article will discuss a kind of statistical fallacy which is somewhat 
different from the preceding ones The fallacies which we have already 
discussed are logical fallacies, while this one is psychological and overlaps 
several of the others 

Let us first comment upon the opposite fallacy, that of small numbers 
When John Doe reads an advertisement which asserts that m a poll of 
ten famous film stars, five were found to smoke Brand A cigarettes, while 
only two smoked Brand B and only one Brand C, he will presumably be 
influenced to smoke Brand A m future As a student of statistics, you 
might not agree with Mi Doe's decision Aside from the perplexing 
question of why you should be motivated to smoke Brand A simply because 
they are pi ef erred by film stars, you might reasonably raise the question 
of whether the figures prove beyond a reasonable doubt that Brand A is m 
fact prefeired by movie stars m general. With the methods of Chapter 
11, you can readily prove that this is not a conclusion which you are 
justified m accepting as proved by the given facts 

The fallacy of large numbers, on the other hand, is one to which trained 
statisticians are unfortunately more susceptible Let us describe it by 
means of an example In a recent master's thesis, an education student 
attempted to show that a new experimental course which he conducted 
resulted m a demonstrable gam m understanding of motivations of people 
m problem situations He distributed a mimeographed account of a 
problem situation at the beginning of the course and asked the students 
to write an analysis of it At the end of the course he asked them to 
analyze the same case again, and he then distributed the pairs of analyses 
(without identification) to cooperating members of the faculty and asked 
them to decide which of the two showed more understanding of the prob- 
lems involved Each pair of papers was read by several judges, so that 
the scatter of the answers was available as a measure of the uncertainty of 
the measurement The number of experimental students m the class was 
sufficiently large so that the probable error of the class mean was small, and 
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the difference between the class ability at the beginning of the term and the 
end of the term turned out to be many times its probable error The 
difference was thus overwhelmingly proved to be significant, and the in- 
vestigator concluded that the value of the experimental course was estab- 
lished 

The fallacy of his conclusion lies m the fact that the argument is not 
merely a two-sided one but a thiee-sided one The observed improve- 
ment may have been due to chance, or it may have been due to the effect 
of the experimental course, or it may have been due to some totally different 
cause The mvestigator secured so overwhelming a defeat over the hypothe- 
sis of chance that he assumed that the job was finished, and overlooked 
the third antagonist altogether 1 During the time m which the experiment 
was in progress, the students of course were subjected to many influences 
besides the expeiimental class, they were attending other classes, meeting 
new people, and indulging m new activities, and any of these might have 
produced the observed improvement m undei standing The investigator’s 
results proved that a significant improvement had occurred m the students, 
but he did not prove that his experimental course produced the change 

An excellent example of the “fallacy of large numbers” can be found m 
some of the analyses of the results of the intelligence tests given to United 
States soldiers m World War I. The tests were given to hundreds of 
thousands of men, so that the standard deviation of the mean of any 
subgroup is very small Even a rather small difference between two 
subgroups will therefore be very large m comparison with its standard 
deviation, and the argument for its being statistically significant is there- 
fore extremely strong The difference between the intelligence scores of 
white and Negro soldiers, for example, is more than thirty times its probable 
eiror, and this constitutes a demonstration of statistical significance which 
is usually beyond the ordinary statistician’s most ambitious hopes This 
has sometimes led to the uncritical conclusion that Negro intelligence is 
inferior to that of whites Subsequent investigations, however, have 
indicated that alternative explanations should be investigated. In particu- 
lar, many of the questions used m the intelligence tests were based upon 
the assumption that the examinee is familiar with the household objects 
of everyday life m an average Amencan home, and it is likely that this 
assumption is not fulfilled for people of very low economic status 
For an even more striking example, let us look again at the Literary 
Digest presidential poll Both because of the dangers of faulty sampling, 
and the possibility of a systematic change of opinion between the poll and 
the actual election, it would be unrealistic to expect to be able to predict 
the actual outcome much more closely than 3 or 4 per cent, and this pre- 
cision could be reached with a well-selected sample of about 500 in each 
state, according to equation 11-7-1 Since the investigators actually used 
a sample of more than two million (for all states combined) we must con- 
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elude that their procedure was very wasteful, and that furthermore the 
apparently high precision attainable by so laige a sample led them to 
overvalue their results— a typical example of the “fallacy of large 
numbers 

It should not be supposed from these remarks that the theoretical pre- 
cision has no meaning With a sample of two million, divided into two 
approximately equal groups, the standard deviation of the percentage in 
each group is 0 035 per cent, by 11-7-1 This represents the precision with 
which the investigators can expect to duplicate their original results if they 
repeat their experiment under the same conditions In this case, however, 
the faulty sampling methods necessarily lead to a wrong result, and by 
using a large sample the investigators are merely measuring the wrong 
quantity with a higher and higher piecision. 

A special case of this fallacy occurs m the interpretation of the results 
of scientific measurements When a scientist measures a single quantity 
by finding the mean of a number of repeated measurements, he usually 
computes the standard deviation of the mean, and, from it, the probable 
error It is customary to accept such a result as indicating that the prob- 
ability is one-half that the true value will lie withm this distance of the 
scientist's value This is, however, a stronger statement than is justified 
by the data, instead we can only say that the probability is one-half that 
the mean of the universe of measurements made in the same way will lie 
within this distance from the scientist's value To equate the two state- 
ments we must assume that the mean of the universe of similar measure- 
ments is the same as the true value of the quantity which he is attempting to 
measure , and this is true only if the errors of measurement are random 
If there is a systematic error, that is, an error which is always of the same 
size and in the same direction for all measurements, then the assumption 
is not valid, and the mean of the universe of measurements will differ from 
the true value of the quantity being measured In such cases a small 
probable error obtained by combining a large number of observations is 
very misleading It is paradoxical that a scientist can control the effects 
of a random — or unpredictable — error by statistical analysis, but has no 
defense against a systematic — or potentially predictable — error except 
through common sense study of his equipment and methods 

7. EXCESSIVE REFINEMENT OF WEAK DATA 

This paragraph again concerns a topic which overlaps several of the 
preceding ones, namely, the tactical error of applying high-precision 
methods to basically poor data In itself this is not a fallacy, since it does 
not necessarily lead to erroneous conclusions, but leads only to wasted 
time and effort In its simplest and most harmless form, it is exemplified 
by the scientist who publishes a probable error to three decimals m a field 
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where there is a strong likelihood that a systematic error will negate even 
the first decimal In the problems of fish populations described m Article 
3, the method now m use (1951) involves an application of the principle 
of maximum likelihood to each day’s catch, so that the computation of 
the population for an entire season’s catch requires many hours for each 
lake studied Since the failure of the fish to mix thoroughly renders the 
entire computation highly uncertain, it would be better to average the 
results for the season and apply a single computation 

This fallacy has been commented upon very pungently by Sir Josiah 
Stamp, as follows “Harold Cox, when a young man m India, quoted some 
Indian statistics to a judge The judge replied, 'Cox, when you are a bit 
older, you will not quote Indian statistics with that assurance The 
government are very keen on amassing statistics — they collect them, 
add them, raise them to the nth power, take the cube root and prepare 
wonderful diagrams But what you must never forget is that every one 
of those figures comes m the first instance from the chowty dar (village 
watchman), who just puts down what he damn pleases ’ ”* 

8. SOME USEFUL PRECAUTIONS 

Some of the errors described m the preceding articles can be avoided by 
specialized techniques which will be described here, while others require 
no more than common sense and an alertness to the dangers of careless- 
ness The following suggestions will include both of these 

A. Keep m touch with the source of your data If you are using data 
gathered by others investigators, make certain that you understand exactly 
what each number means and learn as much as you can about the exact 
process by which it was gathered If you plan to collect your own data, 
then make certain that the principle of classification is specific and see 
that it includes exactly stated criteria for borderline cases If your in- 
vestigation requires the questioning of people, make the questions un- 
ambiguous and objective, and plan to ask them always m the same way 
It may be desirable to test the questions on a small experimental group 
before deciding on their final form 

B Plan your sampling technique so that every factor which might 
possibly contribute a bias is eliminated If, for example, you wish to 
draw a sample of fifty names from an alphabetical list of 500, do not take 
the first fifty on the list The first fifty might possibly contain, for instance, 
a large number of people named “Anderson,” and the people with this name 
might consist of a large fraction of people of Scandinavian descent, who 
in turn might differ significantly from the population as a whole Rather 
than investigate each of these rather remote possibilities, it is easier to 

*Reprmted from Some Economic Factors m Modern Life by Sir Josiah Stamp, by 
permission of Staples Press, Ltd 
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avoid, any possibility of bias by choosing names distributed uniformly 
throughout the alphabet 

This problem is so important that statisticians have devised a special 
technique for combatting it, consisting of choosing a separate sample from 
each subgroup which might be suspected of differing from the entire 
population If, for example, a public opinion poll is to be taken in a com- 
munity m which 30 per cent of the population is Catholic, and if the 
investigator believes that there is any possibility that Catholic opinion 
might differ from the non-Catholic opinion on the point under investiga- 
tion, then he should take care to select 30 per cent of his sample from the 
Catholic population and 70 per cent from the non-Catholic population 
This is called stratified sampling 

An interesting contribution to the problem of avoiding bias has been 
offered by Professor Kinsey m The Sexual Behavior of the Human Male * 
Professor Kinsey and his associates interviewed a large number of students 
in American colleges concerning their sexual behavior, the students being 
selected on the basis of their having volunteered for the interview's It 
has been suggested that the results of the study are not representative of 
college students m general, since the sample consisted only of those students 
who were willing to discuss their sexual behavior, and the behavior of the 
more reticent students might be expected to be somewhat different To 
meet this criticism, Professor Kinsey selected a few small colleges for 
saturation questioning and by appealing for cooperation he was able to 
interview the entire male student body The results for this “One hundred 
per cent sample” did not differ appieciably from the preceding results 
from a volunteer sample, and the validity of his original sampling technique 
was thus established 

C. Analyze causes. If you are planning an experiment of your own, 
in which you wish to measure the effect of one variable upon another, then 
plan the experiment in such a way that any other possible causes of the 
variation are excluded. A widely used and well-known method for ac- 
complishing this is the use of a “control group," a simple example of which 
is the following A large university recently undertook to test the effective- 
ness of a proposed immunization against common colds. Half of the 
students who volunteered for the experiment were given an inoculation of 
the vaccine, and the other half, without knowing it, were given an in- 
oculation of water instead Any difference m the number of colds subse- 
quently reported by the two groups must then have been due to the in- 
oculation If the control group had not been used, any apparent decrease 
m the number of colds might have been attributed to the milder weather, 
or to subjective effects in the reports made by the students, or to any of 
several possible causes other than the treatment. The graduate student 


*W B. Saunders Co., 1948 
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mentioned m Article 6 could have validated his experiment by using a 
control group. 

If you are using statistical data which was not obtained under control 
group conditions and if the cause of the observed effect is m doubt, then 
the use of partial correlation will often aid m eliminating or substantiating 
groups of possible causes And finally, if you have not sufficient data for 
this and can obtain only a simple correlation coefficient, you should re- 
member m reporting your results that the correlation coefficient measures 
only the strength of the relationship and that any statement you make 
about the cause of the relationship is an opinion, which must be based 
ispon your knowledge of the situation and not upon your statistical analysis 
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I. A FOUR-PUCE LOG TABLE (Concluded) 
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80 

9031 

9036 

9042 

9047 

9053 

9058 

9063 

9069 

9074 

9079 

1 

1 

2 

2 

3 

3 

4 

4 

5 

81 

9085 

9090 

9096 

9101 

9106 

9112 

9117 

9122 

9128 

9133 

1 

1 

2 

2 

3 

3 

4 

4 

5 

82 

9138 

9143 

9149 

9154 

9159 

9165 

9170 

9175 

9180 

9186 

1 

1 

2 

2 

3 

3 

4 

4 

5 

83 

9191 

9196 

9201 

9206 

9212 

9217 

9222 

9227 

9232 

9238 

1 

1 

2 

2 

3 

3 

4 

4 

5 

84' 

9243 

9248 

9253 

9258 

9263 

9269 

9274 

9279 

9284 

9289 

1 

1 

2 

2 

3 

3 

4 

4 

5 

85 

9294 

9299 

9304 

9309 

9315 

9320 

9325 

9330 

9335 

9340 

1 

1 

2 

2 

3 

3 

4 

4 

5 

86' 

9345 

9350 

9355 

9360 

9365 

9370 

9375 

9380 

9385 

9390 

1 

1 

2 

2 

3 

3 

4 

4 

5 

87' 

9395 

9400 

9405 

9410 

9415 

9420 

9425 

9430 

9435 

9440 

1 

1 

2 

2 

3 

3 

4 

4 

5 

88' 

9445 

9450 

9455 

9460 

9465 

9469 

9474 

9479 

9484 

9489 

0 

1 

1 

2 

2 

3 

3 

4 

4 

89 ! 

9494 

9499 

9504 

9509 

9513 

9518 

9523 

9528 

9533 

9538 

0 

1 

1 

2 

2 

3 

3 

4 

4 

90! 

9542 

9547 

9552 

9557 

9562 

9566 

9571 

9576 

9581 

9586 

0 

1 

1 

2 

2 

3 

3 

4 

4 

91! 

9590 

9595 

9600 

9605 

9609 

9614 

9619 

9624 

9628 

9633 

0 

1 

1 

2 

2 

3 

3 

4 

4 

92! 

9638 

9643 

9647 

9652 

9657 

9661 

9666 

9671 

9675 

9680 

0 

1 

1 

2 

2 

3 

3 

4 

4 

93! 

9685 

9689 

9694 

9699 

9703 

9708 

9713 

9717 

9722 

9727 

0 

1 

1 

2 

2 

3 

3 

4 

4 

94! 

9731 

9736 

9741 

9745 

9750 

9754 

9759 

9763 

9768 

9773 

0 

1 

1 

2 

2 

3 

3 

4 

4 

95' 

9777 

9782 

9786 

9791 

9795 

9800 

9805 

9809 

9814 

9818 

0 

1 

1 

2 

2 

3 

3 

4 

4 

96! 

9823 

9827 

9832 

9836 

9841 

9845 

9850 

9854 

9859 

9863 

0 

1 

1 

2 

2 

3 

3 

4 

4 

97' 

9868 

9872 

9877 

9881 

9886 

9890 

9894 

9899 

9903 

9908 

0 

1 

1 

2 

2 

3 

3 

4 

4 

98' 

9912 

9917 

9921 

9926 

9930 

9934 

9939 

9943 

9948 

9952 

0 

1 

1 

2 

2 

3 

3 

3 

4 

99! 

9956 

9961 

9965 

9969 

9974 

9978 

9983 

9987 

9991 

9996 

0 

1 

1 

2 

2 

3 

3 

3 

4 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1 

i 

2 

3 

4 

5 

6 

7 

8 

9 



11. A FEW SIX-PLACE LOGS 


N 

log N 

1 005 

002166 

1 010 

004321 

1 015 

006466 

1 020 

008600 

1 025 

010724 

1 030 

012837 

1 035 

014940 

1 040 

017033 

1 045 

019116 

1 050 

021189 

1 055 

023252 

1 060 

025306 



Ill AMERICAN EXPERIENCE MORTALITY TABLE 
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(Based on 100,000 living at age 10) 


Surviving 

Age 

Surviving 

Age 

Surviving 

100,000 

40 

78,106 

70 

38,569 

99,251 

41 

77,341 

71 

36,178 

98,505 

42 

76,567 

72 

33,730 

97,762 

43 

75,782 

73 

31,243 

97,022 

44 

74,985 

74 

28,738 

96,285 

45 

74,173 

75 

26,237 

95,550 

46 

73,345 

76 

23,761 ’ 

94,818 

47 

72,497 

77 

21,330 

94,089 

48 

71,627 

78 

18,961 

93,362 

49 

70,731 

79 

16,670 

92,637 

50 

69,804 

80 

14,474 

91,914 

51 

68,842 

81 

12,383 

91,192 

52 

67,841 

82 

10,419 

90,471 

53 

66,797 

83 

8,603 

89,751 

54 

65,706 

84 

6,955 

89,032 

55 

64,563 

85 

5,485 

88,314 

56 

63,364 

86 

4,193 

87,596 

57 

62,104 

87 

3,079 

86,878 

58 

60,779 

88 

2,146 

86,160 

59 

59,385 

89 

1,402 

85,441 

60 

57,917 

90 

847 

84,721 

61 

56,371 

91 

462 

84,000 

62 

54,743 

92 

216 

83,277 

63 

53,030 

93 

79 

82,551 

64 

51,230 

94 

21 

81,822 

65 

49,341 

95 

3 

81,090 

66 

47,361 



80,353 

67 

45,291 



79,611 

68 

43,133 



78,862 

69 

40,890 
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VI. CHI-SQUARE TABLE 


7 

i 

1 

co 

6 

4 


5 

6 

7 

8 34 10 66 

9 34 11 78 


2 

71 

3 

84 

5 

41 

6 

63 

10 

83 

4 

60 

5 

99 

7 

82 

9 

21 

13 

81 

6 

25 

7 

81 

9 

84 

11 

34 

16 

27 

7 

78 

9 

49 

11 

67 

13 

28 

18 

46 

9 

24 

11 

07 

13 

39 

15 

09 

20 

52 

10 

64 

12 

59 

15 

03 

16 

81 

22 

46 

12 

02 

14 

07 

16 

62 

18 

47 

24 

32 

13 

36 

15 

51 

18 

17 

20 

09 

26 

12 

14 

68 

16 

92 

19 

68 

21 

67 

27 

88 

15 

99 

18 

31 

21 

16 

23 

21 

29 

59 

17 

27 

19 

67 

22 

62 

24 

72 

31 

26 

18 

55 

21 

03 

24 

05 

26 

22 

32 

91 

19 

81 

22 

36 

25 

47 

27 

69 

34 

53 

21 

06 

23 

68 

26 

87 

29 

14 

36 

12 

22 

31 

25 

00 

28 

26 

30 

58 

37 

70 

23 

54 

26 

30 

29 

63 

32 

00 

39 

25 

24 

77 

27 

59 

30 

99 

33 

41 

40 

79 

25 

99 

28 

87 

32 

35 

34 

80 

42 

31 

27 

20 

30 

14 

33 

69 

36 

19 

43 

82 

28 

41 

31 

41 

35 

02 

37 

57 

45 

31 

29 

61 

32 

67 

36 

34 

38 

93 

46 

80 

30 

81 

33 

92 

37 

66 

40 

29 

48 

27 

32 

01 

35 

17 

38 

97 

41 

64 

49 

73 

33 

20 

36 

41 

40 

27 

42 

98 

51 

18 

34 

38 

37 

65 

41 

57 

44 

31 

52 

62 

35 

56 

38 

88 

42 

86 

45 

64 

54 

05 

36 

74 

40 

11 

44 

14 

46 

96 

55 

48 

37 

92 

41 

34 

45 

42 

48 

28 

56 

89 

39 

09 

42 

56 

46 

69 

49 

59 

58 

30 

40 

26 

43 

77 

47 

96 

l 50 

89 

59 

70 





Vll REFERENCE TABLE OF FORMULAS 
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Exponents 


3-3-1 

A* A" = A x+V 

3-3-5 A 0 = 1 (A ^ 0) 

3-3-2 

j* 

— = A x ~ v 

A v 

3-3-6 A~ n = (i^0) 

3-3-3 

H 

II 

<s 

3-3-7 A 1/n = -$Ca 

3-3-4 

A 1 = A 

Arithmetic Mean 

3-10-1 

,31* 

H 


4-2-1 

2 fx 

x — (for frequency tabulations) 

3-10-2 

x + y = x + 

3-10-5 C = C 

3-10-3 

1 

II 

1 

4-4-2 x = rc 0 + a; — x Q 

3-10-4 

Cx = Cx 

4-5-4 £ = x 0 + Cu 



Standard Deviation 

4-3-2 

c = V(x - x) 2 

4-4-3 a — y/ {x — x 0 ) 2 — (x — x 0 .)* 

4-4-1 

% 

1 

[<N 

V s 

11 

b 

4-5-5 cr = C u — u 


Probability 


5-2-1 PU)= — 

5-2-3 P(not A) = 1 — P(A) 

5-7-1 P(A and B) = P(A) X P{B if A has occurred) 
5-8-1 P(A or B) = P(A) + P(B) 

5 ' 10 - 2 


6-3-1 



Normal Curve 


6-3-2 P(t) = 
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VII. REFERENCE TABLE OF FORMULAS (Continued) 


6-4-11 P(t) « 


n — \ / nt \J n + 


P(t) = -W 

V 2r 


^ __ Expl var. 
Tot var 

= ± V~D 


xy — xy 


(x ~ x)(y - y) 


Correlation 


9-10-1 


(x — x 0 )(y ~ 2/0 ) — (s — *o) (2/ - 2/0) 

r — 

Cx&v 


9-11-1 


9-14-5 r = 1 - 


6Z(m — n) 2 

N(N 2 - 1) 


9-15-1 


12-3-3 


2 1 ~ r 

ru Y N - 2 


\rl z — 2r xz r xy r vz + r vz 


12-4-4 


T xv T xz T v 


V(1 - rL)(l — rl t ) 


Regression 


y v = V + — r(x — x) 

<Xx 


12 - 2-2 


12-2-5 


x v = x + — r(y — y) 

<?v 

z » ~ z = A x - g _|_ D v ~ y 


■ ^yx^x, 

v 1 — r 2 



VII REFERENCE TABLE OF FORMULAS (Concluded) 
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Reliability 


10-4-1 

CTx-v ' 

\/ <j x 2cr x r xv cr v -{~ <j v 

10-6-1 

0*+y : 

= Vc r 2 + 2 (J x r xy cr v + <7^ 

10-7-4 


q~* 

Vn 

10-5-1 

<T a = 

(T 

V2N 

11-6-1 

(T/ = 

M 1 - i) 

11-6-3 

<T P = 


9-15-2 


1 - r 2 

(J T ~ 

Viv - i 

9-9-1 

s v = 

orj, V 1 — r 2 

12-2-7 

S z XV 

1 1 ^*xz ^xz^xv^ve ^vz 

= ‘W 1 i - 4 



Skewness and Kurtosis 

*7 A 1 

«3 = 

(x — x) 3 u — 3 uu + 2u 

i - 4-1 

3 ~ 3 

o'* o- u 

7-4-2 

<*4 = 

(2 ■— a ;) 4 ii 4 — 4 u 3 ii + 6w 2 ^ 2 — 3 u 

4 — 4 


cr x o~u 
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ANSWERS TO PROBLEMS 


• • ft 


Chapter 2 


Article 2 

1. 201, 2, 202, 5, 203, 11, 204, 16, 205, 10, 206, 4, 207, 2. 

2 . (a) The third entry m Table 1-4-3, or 206. (b) The total number of variates, 


or 50 


Article 3 

Class 

Table 1-4-1 

Table 1-4-2 

103 0-103 4 

2 

0 

103 5-103 9 

6 

4 

104 0-104 4 

13 

5 

104 5-104 9 

9 

10 

105 0-105 4 

7 

1 

105 5-105 9 

3 

0 


2. (a) 99 4, (b) 99 45, (c) 99 3, (d) 99 45, (e) 99 5. 

3. (a) 201 5, (b) 201. 


Article 5 

1. The class interval should be about 4 or 5. 

Article 6 

1. (a) M = 69, (b) Qx = 61, (c) D 9 = 92, (d) ? 73 = 81. Your answers should 
not differ from these by more than one if your ogive is properly smoothed 

2. His percentile rating m language is 14, m mathematics, 32. 



Article 7 

2 . (a) 46%, (b) 2%, (c) 4%, (d) 12% 
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ANSWERS TO PROBLEMS 


[CH 3 


Chapter 3 

Article 2 

1. (a) 7, (b) 3; (c) 32 

2. (a) 2, (b) 3, (c) 9, (d) 125 

3. (a) 64, (b) 64, (c) 8, (d) 64, (e) 128, (f) 4. 

Article 3 

1. (a) 1/9, (b) 32, (c) 1, (d) 9, (e) 1/27 

2. (a) —3; (b) 0 5, (c) -2 5 

Article 4 

l/ (a) 3 5437, (b) 5 4719, (c) 6 4533 - 10, (d) 4 9900 - 10, (e) 0 5844, (f) 8 1653, 
(g) 0 4732, (h) 9 9117- 10 

2. (a) 3865, (b) 0 000,015,04, (c) 2 563, (d) 0 09416, (e) 1,944,000, (f) 9 602, 
(g) 0 2937, (h) 0 003215 


Article 5 


1. 

7,525,000 

2. 

0 1727 

3. 13 72 4. 18 65 

5. 

0 07158 

6. 

15 10 

7. 6 396 8. 31 80. 

9. 

0 08084. 

10. 

45 82 






Article 6 

1. 

$5538 2. 

$8626 

3. 

$16,800,000,000,000,000,000,000,000,000 


Article 7 

Your answers should agree with the following with an error not much larger 
than one part m a thousand 

1. 7.32 2. 8 46 3. 9 34 4. 9 16 5. 1 456 6. 3 74 

Article 8 

Your answers should agree with the following with an error not much larger 
than one part m a thousand 

I. 1635 2. 16 53. 3. 55 4 4. 3 58 X 10" 8 5. 6.58 X 10 6 

6. 63 5 7. 3 46 X 10" 7 8. 13 6 9. 42 9 10. 527 

II. 0.0527 12. 605 13. 3 96 

Article 9 

4. 400 5. 15,243 6. 75,625 7. 2752. 

Article 10 

4. 2 = 97; x + z = 152; yz = 956, yz = 970 

Article 11 

1. 6 and 2650 2. 31§ and 697f 



[CH 4, 5 


ANSWERS TO PROBLEMS 
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Chapter 4 

Article 2 

1. 98 93, 98 920 The small difference is due to the rounding off effect of forming 
a frequency tabulation 

2. 205 3 Yes, because each number m Table 2-2-1 agrees exactly with the 
corresponding class mark m Table 2-2-2 


1. 3.27 

1. 2.89. 
used 


2. 3 19. 


Article 3 


Article 4 * 

2 . x = 49639 1 , a = 4 57 Equations 4-4-2 and 4-4-3 should be 


Article 5 


1. x = 104 48 and cr = 0 64 for the data m 1-4-1, x = 104 40 and cr = 0 43 /or 
the data m 1-4-2 The distributions are nearly alike in arithmetic mean but 
differ greatly m dispersion 

2. x — 13, cr — 3 57 

3. x = 80 93 ,<7=5 11 


Chapter 5 

Article 2 

1. (a) 0 250, (b) 0 500, (c) 0 308 

2. (a) 0 33, (b) 0 67 

3. (a) 0 0278, (b) 0 111, (c) 7. 

Article 3 

1. It is extremely unlikely that the kinds of fish are equally numerous in the lake, 
and the statement is therefore extremely weak 

2. (a) 12/45, or 0 267, (b) 9/45, or 0 200, (c) 8/38, or 0 211. The third of these 
is the most reliable, smce it is based upon the most information. 

3. (a) 2/7, (b) 1/6, (c) 1/6 

Article 4 

1. (a) 0 84, (b) 0 63, (c) 0 16, (d) 0 00003 

2. (a) 0 27, (b) 0 038 

3. Between 66 and 67 

4. (a) 9/40, (b) 28/40 

Article 5 

1. (a) $42 (b) $2273 

2 . About a quarter of a cent No, the tickets are worth only about 26jf 

3. $133 
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ANSWERS TO PROBLEMS 


[CH.* 


Article 6 

1. 0 24. 2 . 1/36 3 . (1 - 0 3) times (1-0 4), or 0 42 

4 . ! X f X | , or 0 296. 

Article 7 

1. (a) 3/51 or 0 059, (b) 13/204 or 0 064, (c) 19/34 or 0 559. 

2. 012 

3 0 72 X 0.87 X 0 91 or 0.570. 

4 . 0 252. 


Article 8 

1. 1/3. 2 . 1/2 3 . 1 - 1/9, or 8/9 


Article 9 

2 . (a) 8' or 40,320, (b) 7' or 5,040, (e) 8 X 5 X 4 X 5' or 19,200, (d) 8 X 3' X 41 
or 1152 

3 . ' P (15, &) = 3,603,600 


Article 10 

1. (a) 45/512 or 0 088, (b) 405/1024 or 0 336; (c) 1/1024 or 0 00098 

2 . 5/3888 or 0 00129 3 . 6'/(3'2i), or 60 

4 . (a) 15/64, (b) 3/32, (c) 1/64 


Article 11 

1. (150 + 20 + 1)/1296, or 0 132 

2 . (0 8 X 0 6) X (1 - 0 1 X 0 25 X 0 7), or 0 472 

3 . 0 995 

4 . The probability that the father will survive is 0 74744, the son, 0 92608 
The son’s expectation is 0 92608 times 0 74744 times $20,000, plus 0 92608 
times 0 25256 times $40,000, or, m all, $23,199 The father’s total expectation 
is $16,054, and the college’s is $747 

5. 0 403,0 150,0 326,0 121 

6. 1 - (0 748) I0 , or 0 925 

7 . 0 20 

8 . 42/165, 84/165 + 35/165 or 119/165 

9 . 50/99,49/99. 


Chapter 6 


Article 2 

1. (a) 0 17, (b) 0 44 

2 . (a) 0 033, (b) 0 019, (c) 0 21, (d) 22 

3 . About $5000 



[CH. 7 


ANSWERS TO PROBLEMS 


177 


1 . 


1 . 


Article 3 

P( 1) is 0 250, which is about the same as the corresponding value m Figure 
6-3-1 


Article 4 

P(0) = 0 375, P(l) = 0 250, P(2) = 0 0625 


Article 5 

1. P(0) = 0 399, P(l) = 0 242, P(2) = 0 054, P(2 5) = 0 018. 


Article 6 

1. Yes » 

2 . (a) No, (b) yes, (c) no Sex and race differences m (a) and (c) violate the 
first condition 

3. No The elements are not independent and the second condition is violated 


Article 7 

1. P = 0 0104 X 0 6, or 0 0062 

2. P = 0 4986 — 0 4918 or 0 0068 Yes The method of Problem 2 is exact 

3. 0 0214 

Article 9 

1. The predicted frequencies for the six classes are 14, 5 7, 12 6, 14 1, 8 1, and 
2 4 The disagreement is largest for the second class 

2. The predicted frequencies for the six classes are 2.6, 10 7, 23 0, 25.0, 14 3, 
and 4 3 

Article 10 

1. (a) More than 1100 (b) About 615 

2. The old contract is better The old contract gives him $920 per hundred 
turkeys, while the new one would give him only $872 

3. (a) About 6 (b) None The first contract would be very unstrategic The 

second would be very safe, smce the probability of a failure m any sample of 
a thousand wires would be only about 0 000,003 


Chapter 7 

Article 3 

1. 1 09 and 3 16 

2 . Figure 7-2-1 is strongly leptokurtic, with an a± of 4 2, and Figure 7-2-2 is 
platykurtic with an a 4 of 2 7 

Article 4 

1. For Table 1-4-1, a 3 = +0 10 and a 4 = 2.4 For Table 1-4-2, = -0 39 

and = 2 1 

2. 0 04,2 4 

3. 0 48,2 6 
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ANSWERS TO PROBLEMS 


[CH 8 


Article 5 

1. (a) 5 25; (b) 3 5; (c) 4 12; (d) 3.43. 

2. (a) 98 70, (b) 98 74 

Article 6 

1. 2 3 2. 2 98 3. (a) 8 33, (b) 12 8, (c) 576 

4. (a) 3 4, (b) PE from equation 6-8-1 is 3 45 

Article 7 

The following results have been read from a smoothed ogive. Your results may 
■differ slightly from them 

1. 0.13. 2. 0.18. 3. (a) 0 42, (b) 0 62 

Article 8 

1. 0 25. 

r 


Chapter 8 

Article 4 

1. 4. 2. -2. 3. -1. 4. 0 

Article 5 

1. y — —4x +4. 2. y = x + 2 3. y = f x — 1. 

Article 7 

1. A graphical answer y = 3 3# — 5 0, where x is the entrance examination 
score and y is the subsequent mathematics grade Your answer may differ 
somewhat from this 

2. The best possible line is y = 1 5x + 9 2 Your result should differ from this 
only slightly if your curve is well drawn 

3. Line (a) is better For it, 2(y — y v ) 2 is 3 00, while that for line (b) is 3.64. 

Article 8 

1. The least squares lme is y = 1 5x + 9 2 For it, 2J(y — y v f is 2 80 

2. y = 2 Sx + 27 


1 . 


2 . 


See text 
926 
x 


Vv 


+ 2 . 


Article 9 
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Chapter 9 

Article 2 

1. 50 , 0 44, 0 56. 2. 80 , 0 11, 0 89 3. 45, 50, 0.10. 

4. 15,5,0 25 5. 30,50,0 60 6. 6,20,0 70 

Article 3 

1. 0.66 2. 0 33 3. 0 95 4. 0 87 5. 0 77. 6. 0 84 

Article 4 

1. 64% 2. 0 60 3. Yes If temperature is related to rainfall, equation 

9-4-1 will not be applicable 


Article 6 

2. r = 0 97, D = 0 94, A = 0 06 

3. r — 0.82, D = 0 67, A = 0 33 


Article 7 

1. H v — 2.8A + 27, where H is height and A is age 

2. 63 inches 3. A v = 0 2477 - 3 4. 8 years. 5. P v = 15 A 

+ 9.2. 


Article 8 

1. 26 2. 2 68. 3. (a) 0 24, (b) 0 26, (c) 0 07, (d) less than 0 00003 


Article 9 

1. S B = 2.8. 2. 0.85 3. S A = 0.8. 4. 0 006 

Article 10 

2. —0 75 3. 0 56, 0 44 4. R v = 0 31 — 0 26(7? — 29 1), where R is 

rainfall and B is barometric pressure The parenthesis on the right has been 
retamed to avoid the use of large numbers m applying the equation 

5. 0 60 inch 6. S R = 0 19 mch, P = 0 02 7. 0 99 


Article 11 

1. t E m = 0 78. 2. r EL = 0.57. 3. r XM = - 0 41 4. r XL = 0.52. 

6. (a) Jackson's mathematics grade, as predicted from his entrance exam score, 

is 57.2. (b) The standard error of estimate is 8 4, and the probability that 
Jackson will earn 60 or above, m spite of the prediction, is 0 37 (c) r EM is 

0 78, while r EL is only 0 57 The entrance test therefore predicts 61% of the 
variance m mathematics grades and only 32% of the variance m language 
grades, (d) The new test is nearly as good as the old one for predicting language 
grades but much poorer for predicting mathematics grades 

6* t ex = — 0.23 
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Article 12 


1. 0 83 2 . W P = 0 78H +6 5, where W and H are the ages of husband 

and wife 

3. S w =85 4. = 088W+ 10 6 5. W P = 48. 6. P-0017. 


1. 0 55 2. —0 70 


Article 14 


Article 15 

1. (a) 0 65, 0 13, (b) 0 69, 0 05; (c) 0.699, 0 016, (d) 0 00, 0.14; (e) 0.198, 0 020, 
- (f) 0 86, 0 08 

2. 0 75 3. 0 15 4. 0 77,0 06 5. 0 829,0 011. 


r Chapter 10 

Article 4 

1. = 16 5; P(M - Z, > 10) = 0 29 

2. H - W =0 5; crz> = 9 3; PE D =6 3; P(H - W > 10) = 0.15. 

Article 5 

1. x-y = Z) = 6 8. 2. crs — 6 0 3. PJ^ - 4 0. 

Article 6 

1. a = VO 02 2 + 0 02® = 0 028 

2. a = Vo 02 2 - 2 X 0 4 X 0 2~X 0~2~+~6 02~ 2 = 0 022. 

3* cr j = 26 8; PE% = 18.1 


Article 7 

1. (a) 79 3067, (b) 0 0031, (c) 0 0021; (d) 79 3067 ± 0 0021, (e) 0 015 

2 . The probability that the true value will he between 24 50 and 25 00 is 0 997; 
between 24 70 and 24 80 it is only 0 44 

3. (a) P = 0 0007, (b) P<0 000,000,8 

4. (VI and VIII) The risk to the manufacturer can be made negligible by specify- 
ing a sample size of ten or more (VII) We have seen from Problem 6-10-3 
that about 0 6% will break at 200 pounds The contract would clearly be a 
nsky one for the manufacturer 

5. (a) From equation 10-7-4 we see that he can reduce his error to half of its 
former value by taking the mean of four times as many measurements He 
must therefore take the mean of twenty measurements of each angle, (b) He 
can reduce the effect of only the random errors m this way. The limit of accu- 
racy will probably be set by the presence of systematic errors. 
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Chapter 11 

Article 2 

1. The probability that so large a difference would occur by chance is less than 

0 00006 We must conclude that the treatment has reduced temperatures 

2 . The probability that so large a difference will occur by chance is 0 35, and we 
conclude that the data fail to prove that a significant difference exists 

3 . The probability of so large a difference, under the null hypothesis, is only 

0.003 Further samples should be expected to show a similar difference 

Article 3 

1. In Problem 1 the difference is proved to be significant at the 0 0001 level (or at 
the 0 00006 level) In Problem 3, it is proved to be significant at the 0.01 
level (or at the 0 003 level) Using criteria 11-3-1, both are “certainly sig- 
nificant ” Using cntena 11-3-2, the difference m temperature is “certainly 
significant” and the difference m heights is “probably significant.” y 

Article 4 

1. 117. 2 . 203 3 . 44,177. 4 . 36,50. 

Article 5 

1. Since <r ff is 2.7, the observed a differs from its expected value by only about 

1 1 in t units These data do not therefore prove that there is a difference in 
variability 

2. If this difference should be real, it would require approximately 53 cases to 
prove it at the 0 01 level, or 87 to prove it at the 0 001 level 

3 . The difference of the standard deviations is 0 3, and the standard deviation 
of this difference is 0 65 There is no significant difference m variability 

4 . <r a for the untreated patients is 0 072, and for the treated patients, 0 068 
Assuming that the variability is the same for the two groups, the standard 
deviation of the difference between the standard deviations of two such groups 
is 0 099 The actual difference between the two standard deviations is 0 21 
The actual difference is therefore 2 1 t units away from the expected value, 
and the probability that this will occur by chance is only 0 034. We conclude 
that the difference m variability is significant 

5 . The difference between the two standard deviations is 0 2, and the standard 
deviation of this difference is 0 071 The difference is therefore 2 8 t units 
away from its expected value The probability that so large a difference will 
occur by chance is 0 004, and we conclude that the difference m the standard 
deviations is significant. 

Article 6 

1. The standard deviation of these frequencies is about 30, while the difference m 
frequencies is only 33 Such a difference could easily have occurred by chance. 

2 . The standard deviation of the usual frequency is 3 9, and the difference is 13 
This is a difference of 3 3 m t units, and the probability that such a difference 
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would occur by chance is only 0 001 We must conclude that a new cause of 
typhoid, not previously present, is operating 

3. The standard deviations of the actually counted frequencies are 16 3 and 16 6, 
while the difference between these frequencies is 12. The difference is not 
significant 

Article 7 

1. In the first class, the percentage of failures was 42%, with a standard devia- 
tion of 5 7% In the second class it is 13%, with a standard deviation of 3 7% 
The standard deviation of the difference between the two percentages is 6 8% 
The actual difference is 29%, or 4 3 t units This difference is almost impossible 
on the hypothesis of chance (P < 0 00002), and the difference is certainly 
significant 

2. The standard deviations of the two percentages are 3 4% and 3 1% Since 
the difference between the two percentages is only 2%, it is not significant 

3. About 3500. 4 (a) No, (b) yes, (c) yes, (d) yes 5. No 

6. (a) Yes, (b) The court voted “for” m 43% of the cases m the first session and 

o3% m the second Frankfurter voted “for” m 17% of the cases m the first 
session Allowing for changes m the nature of the cases (as evidenced by the 
change m overall vote) he would be expected to vote “for” m 21% of the cases 
m the second session His actual vote m the second session is 45% “for”, which 
exceeds the expected vote by six times its standard deviation We conclude 
that Frankfurter’s change of attitude is real and is independent of any changes 
m the nature of the cases 

Article 8 

1. P< 0 2, the die is probably defective 

2. If we assign a half-vote for cases m which a Justice abstained from voting, we 
find a total of 47 “for” votes On the hypothesis that the attitudes of the 
Justices are alike, their predicted number of “for” votes is 47/9, or 5 22, for 
each Justice Comparing this with the actual number of their “for” votes, 
we find that P<0 001, and the hypothesis is totally untenable 

3. We find that P<0 01, and the hypothesis is untenable. 

4. Here we find that P = 0 2, and the hypothesis is acceptable 

5. P< 0 02 The temperatures almost certainly could not have come from the 
same universe 

6. P>0 50, distribution is normal within expected limits of sampling error 

7. P>0 50, distribution is normal 


Chapter 12 

Article 2 

3. -0 5 4. 0 0005 5. M v = 0 137Z + 2 088P + 47 1 

6 . 7.8 If old test is used alone, S is 8 4, so that the improvement is small and 
the use of the new test is probably not justified for this purpose 
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7 . 7 0 If the old test is used alone, S is 12 1 This large improvement m S 
indicates that it would be desirable to retam both tests and use them jointly 
for predicting language grades 

Article 3 

2. 0 82. 3 . 61%; 17%, 67%, 33%. 4. 0 88. 6 . 32%, 27%; 77%, 

23%. 

Article 4 

1. Less than 0 01 It appears highly likely that dating affects grades only through 
loss of study time 

2 . r E j a — 0 60 This is the coefficient of correlation between education and 
income which we would expect to find within any subgroup of people with tfee 
same ability as each other Smce this is nearly as high as the simple co- 
efficient of correlation, it appears likely that the primary relationship is 
directly between education and income 

3 . 0 83 4 . 0 77 
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Empirical probability, 77 
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Estimating, 133 
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uses of, 113 ff 
Null hypothesis, 214 
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35 
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definition of, 74 
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Summation symbol, 56 ff 
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